View
0
Download
0
Category
Preview:
Citation preview
Database Design
Contents
Chapter 1 Before the Advent of Database Systems111 File-based approach for banking system 112 Data Redundancy 113 Data Isolation 214 Integrity problems215 Security problems216 Concurrency ndash access anomalies217 Database Approach 318 Role of databases in Business 319 Meaning of Data 3
Chapter 2 Fundamental Concepts 5Chapter 3 Characteristics and Benefits of a Database 731 Self-Describing Nature of a Database System 732 Insulation between Program and Data 733 Support multiple views of data 734 Sharing of data and Multiuser system 735 Control Data Redundancy 836 Data Sharing 837 Enforcing Integrity Constraints 838 Restricting Unauthorised Access 839 Data Independence 8310 Transaction Processing 9311 Providing multiple views of data 9312 Providing backup and recovery facilities 9313 Managing information 9
Chapter 4 Types of Database Models 1141 High-level Conceptual Data models 11
411 Record-based Logical Data models11
Chapter 5 Data Modelling 1351 Degree of Data Abstraction 1352 External models 1453 Conceptual model 1454 Internal model 1455 Physical model 15
551 Data Abstraction Layer 15552 Schemas 16553 Logical and Physical Data Independence 17554 Logical data independence 17555 Physical data independence 17
Chapter 6 Classification of Database Systems 1861 Based on data model 1862 Based on the number of users 1863 Based on the ways database is distributed 18
631 Centralized Systems18
632 Distributed database system19633 Homogeneous distributed Database Systems 19634 Heterogeneous distributed Database Systems 19
Chapter 7 The Relational Data Model 2071 Fundamental concepts in Relational Data Model 20
711 Relation20712 Table 20713 Column 21714 Domain 21715 Records22
72 Degree 2373 Properties of Tables 23
731 Terminology 23
Chapter 8 Entity Relationship Model2581 Entity Entity Set and Entity Type 25
811 Types of Entities26812 Attributes27813 Types of Attributes 27814 Keys 29
8141 Candidate key 298142 Composite key 298143 Primary key 308144 Secondary key308145 Alternate key308146 Foreign key308147 Nulls 31
815 Entity Types 328151 Existence Dependency 328152 Weak Entities 328153 Strong Entities 32
816 Relationships 338161 1M relationship 338162 11 relationship338163 11 Example338164 MN relationships 348165 MN relationships 34
817 Unary Relationships (recursive)35818 Ternary Relationships 35819 Relationship Strength 36
Chapter 9 Integrity Rules and Constraints 3791 Domain Integrity 3792 Referential integrity Examples 3793 Referential Integrity in MS Access 3894 Referential Integrity using Transact SQL (MS SQL Server) 3895 Foreign Key Rules 39
951 Business Rules 39
96 Cardinality 40961 Relationship Participation 40
97 Optional 4198 Mandatory 41
981 Relationship Types 43
Chapter 10 ER Modelling 44101 Relational Design and Redundancy 44102 Insertion anomaly 45103 Update anomaly 45104 Deletion anomaly 45
Chapter 11 Functional Dependencies48111 Rules of Functional Dependencies 48112 Inference Rules 49
1121 Axiom of reflexivity 491122 Axiom of augmentation501123 Axiom of transitivity 50
113 Additional rules 501131 Union 501132 Decomposition 51
114 Dependency Diagram 51
Chapter 12 Normalization 52121 Normal Forms52
1211 School database 5312111 Process for 1NF 5312112 Update anomalies in 1st normal form 53
122 2nd Normal Form531221 Process for 2NF 541222 Update anomalies in 2nd normal form54
123 3rd Normal Form541231 Remove Transitive Dependencies54
12311 Update anomalies in 3rd normal form 55
1232 Boyce-Codd Normal Form (BCNF)561233 BCNF Example 2 58
12331 Normalization and Database Design 59
Chapter 13 Database Development Process 60131 SDLC ndash Waterfall 60132 Database Life Cycle 61133 Requirements gathering 62134 Analysis 63135 Logical Design 63136 Implementation 65137 Realising the design 66138 Populating the database 66139 Guidelines For Developing An ER Diagram 67
Chapter 14 Database Users 68141 End users 68142 Application Programmers 68143 Database Administrators 68
Chapter 1 Before the Advent ofDatabase Systems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system
Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc
Fig 11
httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)
11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Keeping organizational information in this approach has a number of disadvantagesincluding
12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as
1
bull inconsistency in data format
bull the same information may be kept in several different place (files)
bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort
13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files
14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull
bull It is difficult to make changes to the programs to enforce new constraints
15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull There are constraints regarding accessing privileges
bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints
16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously
These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach
Chapter 1 2
17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea
18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways
Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations
19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element
If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it
bull Filing cabinets
bull Spreadsheets
bull Folders
bull Ledgers Lists
bull Piles of papers on your desk
They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information
3
they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)
Chapter 1 4
Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties
bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information
bull A database is logical coherent and internally consistent
bull A database is designed built and populated with data for a specific purpose
bull Each data item is stored in a field
bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee
A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information
Fig 21
httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image
5
Fig 22
httpscnxorgcontentm28150latest
Chapter 2 6
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Contents
Chapter 1 Before the Advent of Database Systems111 File-based approach for banking system 112 Data Redundancy 113 Data Isolation 214 Integrity problems215 Security problems216 Concurrency ndash access anomalies217 Database Approach 318 Role of databases in Business 319 Meaning of Data 3
Chapter 2 Fundamental Concepts 5Chapter 3 Characteristics and Benefits of a Database 731 Self-Describing Nature of a Database System 732 Insulation between Program and Data 733 Support multiple views of data 734 Sharing of data and Multiuser system 735 Control Data Redundancy 836 Data Sharing 837 Enforcing Integrity Constraints 838 Restricting Unauthorised Access 839 Data Independence 8310 Transaction Processing 9311 Providing multiple views of data 9312 Providing backup and recovery facilities 9313 Managing information 9
Chapter 4 Types of Database Models 1141 High-level Conceptual Data models 11
411 Record-based Logical Data models11
Chapter 5 Data Modelling 1351 Degree of Data Abstraction 1352 External models 1453 Conceptual model 1454 Internal model 1455 Physical model 15
551 Data Abstraction Layer 15552 Schemas 16553 Logical and Physical Data Independence 17554 Logical data independence 17555 Physical data independence 17
Chapter 6 Classification of Database Systems 1861 Based on data model 1862 Based on the number of users 1863 Based on the ways database is distributed 18
631 Centralized Systems18
632 Distributed database system19633 Homogeneous distributed Database Systems 19634 Heterogeneous distributed Database Systems 19
Chapter 7 The Relational Data Model 2071 Fundamental concepts in Relational Data Model 20
711 Relation20712 Table 20713 Column 21714 Domain 21715 Records22
72 Degree 2373 Properties of Tables 23
731 Terminology 23
Chapter 8 Entity Relationship Model2581 Entity Entity Set and Entity Type 25
811 Types of Entities26812 Attributes27813 Types of Attributes 27814 Keys 29
8141 Candidate key 298142 Composite key 298143 Primary key 308144 Secondary key308145 Alternate key308146 Foreign key308147 Nulls 31
815 Entity Types 328151 Existence Dependency 328152 Weak Entities 328153 Strong Entities 32
816 Relationships 338161 1M relationship 338162 11 relationship338163 11 Example338164 MN relationships 348165 MN relationships 34
817 Unary Relationships (recursive)35818 Ternary Relationships 35819 Relationship Strength 36
Chapter 9 Integrity Rules and Constraints 3791 Domain Integrity 3792 Referential integrity Examples 3793 Referential Integrity in MS Access 3894 Referential Integrity using Transact SQL (MS SQL Server) 3895 Foreign Key Rules 39
951 Business Rules 39
96 Cardinality 40961 Relationship Participation 40
97 Optional 4198 Mandatory 41
981 Relationship Types 43
Chapter 10 ER Modelling 44101 Relational Design and Redundancy 44102 Insertion anomaly 45103 Update anomaly 45104 Deletion anomaly 45
Chapter 11 Functional Dependencies48111 Rules of Functional Dependencies 48112 Inference Rules 49
1121 Axiom of reflexivity 491122 Axiom of augmentation501123 Axiom of transitivity 50
113 Additional rules 501131 Union 501132 Decomposition 51
114 Dependency Diagram 51
Chapter 12 Normalization 52121 Normal Forms52
1211 School database 5312111 Process for 1NF 5312112 Update anomalies in 1st normal form 53
122 2nd Normal Form531221 Process for 2NF 541222 Update anomalies in 2nd normal form54
123 3rd Normal Form541231 Remove Transitive Dependencies54
12311 Update anomalies in 3rd normal form 55
1232 Boyce-Codd Normal Form (BCNF)561233 BCNF Example 2 58
12331 Normalization and Database Design 59
Chapter 13 Database Development Process 60131 SDLC ndash Waterfall 60132 Database Life Cycle 61133 Requirements gathering 62134 Analysis 63135 Logical Design 63136 Implementation 65137 Realising the design 66138 Populating the database 66139 Guidelines For Developing An ER Diagram 67
Chapter 14 Database Users 68141 End users 68142 Application Programmers 68143 Database Administrators 68
Chapter 1 Before the Advent ofDatabase Systems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system
Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc
Fig 11
httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)
11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Keeping organizational information in this approach has a number of disadvantagesincluding
12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as
1
bull inconsistency in data format
bull the same information may be kept in several different place (files)
bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort
13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files
14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull
bull It is difficult to make changes to the programs to enforce new constraints
15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull There are constraints regarding accessing privileges
bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints
16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously
These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach
Chapter 1 2
17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea
18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways
Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations
19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element
If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it
bull Filing cabinets
bull Spreadsheets
bull Folders
bull Ledgers Lists
bull Piles of papers on your desk
They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information
3
they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)
Chapter 1 4
Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties
bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information
bull A database is logical coherent and internally consistent
bull A database is designed built and populated with data for a specific purpose
bull Each data item is stored in a field
bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee
A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information
Fig 21
httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image
5
Fig 22
httpscnxorgcontentm28150latest
Chapter 2 6
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
632 Distributed database system19633 Homogeneous distributed Database Systems 19634 Heterogeneous distributed Database Systems 19
Chapter 7 The Relational Data Model 2071 Fundamental concepts in Relational Data Model 20
711 Relation20712 Table 20713 Column 21714 Domain 21715 Records22
72 Degree 2373 Properties of Tables 23
731 Terminology 23
Chapter 8 Entity Relationship Model2581 Entity Entity Set and Entity Type 25
811 Types of Entities26812 Attributes27813 Types of Attributes 27814 Keys 29
8141 Candidate key 298142 Composite key 298143 Primary key 308144 Secondary key308145 Alternate key308146 Foreign key308147 Nulls 31
815 Entity Types 328151 Existence Dependency 328152 Weak Entities 328153 Strong Entities 32
816 Relationships 338161 1M relationship 338162 11 relationship338163 11 Example338164 MN relationships 348165 MN relationships 34
817 Unary Relationships (recursive)35818 Ternary Relationships 35819 Relationship Strength 36
Chapter 9 Integrity Rules and Constraints 3791 Domain Integrity 3792 Referential integrity Examples 3793 Referential Integrity in MS Access 3894 Referential Integrity using Transact SQL (MS SQL Server) 3895 Foreign Key Rules 39
951 Business Rules 39
96 Cardinality 40961 Relationship Participation 40
97 Optional 4198 Mandatory 41
981 Relationship Types 43
Chapter 10 ER Modelling 44101 Relational Design and Redundancy 44102 Insertion anomaly 45103 Update anomaly 45104 Deletion anomaly 45
Chapter 11 Functional Dependencies48111 Rules of Functional Dependencies 48112 Inference Rules 49
1121 Axiom of reflexivity 491122 Axiom of augmentation501123 Axiom of transitivity 50
113 Additional rules 501131 Union 501132 Decomposition 51
114 Dependency Diagram 51
Chapter 12 Normalization 52121 Normal Forms52
1211 School database 5312111 Process for 1NF 5312112 Update anomalies in 1st normal form 53
122 2nd Normal Form531221 Process for 2NF 541222 Update anomalies in 2nd normal form54
123 3rd Normal Form541231 Remove Transitive Dependencies54
12311 Update anomalies in 3rd normal form 55
1232 Boyce-Codd Normal Form (BCNF)561233 BCNF Example 2 58
12331 Normalization and Database Design 59
Chapter 13 Database Development Process 60131 SDLC ndash Waterfall 60132 Database Life Cycle 61133 Requirements gathering 62134 Analysis 63135 Logical Design 63136 Implementation 65137 Realising the design 66138 Populating the database 66139 Guidelines For Developing An ER Diagram 67
Chapter 14 Database Users 68141 End users 68142 Application Programmers 68143 Database Administrators 68
Chapter 1 Before the Advent ofDatabase Systems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system
Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc
Fig 11
httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)
11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Keeping organizational information in this approach has a number of disadvantagesincluding
12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as
1
bull inconsistency in data format
bull the same information may be kept in several different place (files)
bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort
13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files
14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull
bull It is difficult to make changes to the programs to enforce new constraints
15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull There are constraints regarding accessing privileges
bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints
16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously
These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach
Chapter 1 2
17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea
18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways
Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations
19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element
If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it
bull Filing cabinets
bull Spreadsheets
bull Folders
bull Ledgers Lists
bull Piles of papers on your desk
They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information
3
they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)
Chapter 1 4
Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties
bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information
bull A database is logical coherent and internally consistent
bull A database is designed built and populated with data for a specific purpose
bull Each data item is stored in a field
bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee
A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information
Fig 21
httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image
5
Fig 22
httpscnxorgcontentm28150latest
Chapter 2 6
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
96 Cardinality 40961 Relationship Participation 40
97 Optional 4198 Mandatory 41
981 Relationship Types 43
Chapter 10 ER Modelling 44101 Relational Design and Redundancy 44102 Insertion anomaly 45103 Update anomaly 45104 Deletion anomaly 45
Chapter 11 Functional Dependencies48111 Rules of Functional Dependencies 48112 Inference Rules 49
1121 Axiom of reflexivity 491122 Axiom of augmentation501123 Axiom of transitivity 50
113 Additional rules 501131 Union 501132 Decomposition 51
114 Dependency Diagram 51
Chapter 12 Normalization 52121 Normal Forms52
1211 School database 5312111 Process for 1NF 5312112 Update anomalies in 1st normal form 53
122 2nd Normal Form531221 Process for 2NF 541222 Update anomalies in 2nd normal form54
123 3rd Normal Form541231 Remove Transitive Dependencies54
12311 Update anomalies in 3rd normal form 55
1232 Boyce-Codd Normal Form (BCNF)561233 BCNF Example 2 58
12331 Normalization and Database Design 59
Chapter 13 Database Development Process 60131 SDLC ndash Waterfall 60132 Database Life Cycle 61133 Requirements gathering 62134 Analysis 63135 Logical Design 63136 Implementation 65137 Realising the design 66138 Populating the database 66139 Guidelines For Developing An ER Diagram 67
Chapter 14 Database Users 68141 End users 68142 Application Programmers 68143 Database Administrators 68
Chapter 1 Before the Advent ofDatabase Systems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system
Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc
Fig 11
httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)
11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Keeping organizational information in this approach has a number of disadvantagesincluding
12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as
1
bull inconsistency in data format
bull the same information may be kept in several different place (files)
bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort
13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files
14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull
bull It is difficult to make changes to the programs to enforce new constraints
15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull There are constraints regarding accessing privileges
bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints
16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously
These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach
Chapter 1 2
17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea
18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways
Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations
19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element
If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it
bull Filing cabinets
bull Spreadsheets
bull Folders
bull Ledgers Lists
bull Piles of papers on your desk
They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information
3
they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)
Chapter 1 4
Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties
bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information
bull A database is logical coherent and internally consistent
bull A database is designed built and populated with data for a specific purpose
bull Each data item is stored in a field
bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee
A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information
Fig 21
httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image
5
Fig 22
httpscnxorgcontentm28150latest
Chapter 2 6
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 14 Database Users 68141 End users 68142 Application Programmers 68143 Database Administrators 68
Chapter 1 Before the Advent ofDatabase Systems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system
Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc
Fig 11
httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)
11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Keeping organizational information in this approach has a number of disadvantagesincluding
12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as
1
bull inconsistency in data format
bull the same information may be kept in several different place (files)
bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort
13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files
14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull
bull It is difficult to make changes to the programs to enforce new constraints
15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull There are constraints regarding accessing privileges
bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints
16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously
These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach
Chapter 1 2
17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea
18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways
Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations
19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element
If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it
bull Filing cabinets
bull Spreadsheets
bull Folders
bull Ledgers Lists
bull Piles of papers on your desk
They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information
3
they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)
Chapter 1 4
Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties
bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information
bull A database is logical coherent and internally consistent
bull A database is designed built and populated with data for a specific purpose
bull Each data item is stored in a field
bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee
A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information
Fig 21
httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image
5
Fig 22
httpscnxorgcontentm28150latest
Chapter 2 6
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 1 Before the Advent ofDatabase Systems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One way to keep information on a computer is to store it in permanent files Acompany system has a number of application programs each of them is defined tomanipulate data files These application programs have been written on request ofthe users in the organization New application will be added to the system as the needarises The system just described is called the file-based system
Consider a traditional banking system which uses the file-based system to manage theorganizationrsquos data in the picture below As we can see there are differentdepartments in the Bank each of them has their own applications that manage andmanipulate different data files For banking systems the programs can be the ones todebit or credit an account find the balance of an account add a new mortgage loanor generate monthly statements etc
Fig 11
httpcnxorg contentm28150latest (httpcnxorg20contentm28150latest)
11 File-based approach for banking systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Keeping organizational information in this approach has a number of disadvantagesincluding
12 Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Since files and applications are created by different programmers of variousdepartments over long periods of time it might lead to several problems such as
1
bull inconsistency in data format
bull the same information may be kept in several different place (files)
bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort
13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files
14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull
bull It is difficult to make changes to the programs to enforce new constraints
15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull There are constraints regarding accessing privileges
bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints
16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously
These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach
Chapter 1 2
17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea
18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways
Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations
19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element
If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it
bull Filing cabinets
bull Spreadsheets
bull Folders
bull Ledgers Lists
bull Piles of papers on your desk
They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information
3
they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)
Chapter 1 4
Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties
bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information
bull A database is logical coherent and internally consistent
bull A database is designed built and populated with data for a specific purpose
bull Each data item is stored in a field
bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee
A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information
Fig 21
httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image
5
Fig 22
httpscnxorgcontentm28150latest
Chapter 2 6
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull inconsistency in data format
bull the same information may be kept in several different place (files)
bull data inconsistency which means various copies of the same data are conflicting this wastes stor age space and duplicates effort
13 Data IsolationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull It is difficult for new applications to retrieve the appropriate data which might bestored in various files
14 Integrity problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data values must satisfy certain consistency constraints that are specified in theapplication programsbull
bull It is difficult to make changes to the programs to enforce new constraints
15 Security problemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull There are constraints regarding accessing privileges
bull Application requirements are added to the system in an ad-hoc manner so it isdifficult to enforce constraints
16 Concurrency ndash access anomaliesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Data may be accessed by many applications that have not been coordinatedpreviously so it is not easy to provide a strategy to support multiple users toupdate data simultaneously
These difficulties have prompted the development of a new approach in managinglarge amount of organizational information ndash the database approach
Chapter 1 2
17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea
18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways
Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations
19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element
If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it
bull Filing cabinets
bull Spreadsheets
bull Folders
bull Ledgers Lists
bull Piles of papers on your desk
They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information
3
they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)
Chapter 1 4
Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties
bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information
bull A database is logical coherent and internally consistent
bull A database is designed built and populated with data for a specific purpose
bull Each data item is stored in a field
bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee
A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information
Fig 21
httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image
5
Fig 22
httpscnxorgcontentm28150latest
Chapter 2 6
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
17 Database ApproachAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Databases and database technology play an important role in most of areas wherecomputers are used including business education medicine etc To understand thefundamentals of database systems we start by introducing the basic concepts in thisarea
18 Role of databases in BusinessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Everybody uses a database in some way maybe just to store information about theirfriends and family That data might be written down or stored in a computer using aword processing program or it could even be saved in a spreadsheet However thebest place for it would be the Database Management Software (DBMS) The DBMS is apower software tool that allows you to store manipulate and retrieve data in a varietyof different ways
Most companies keep track of customer information by storing it in a database Thatdata may be customers employees products orders or anything that may assist thebusiness in their operations
19 Meaning of DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data is factual information such as measurements or statistics about objects andconcepts We use it for discussion or calculation It could be a person a place anevent an action any one of a number of things A single fact is an element of data ora data element
If data is information and information is what we are all in the business of workingwith you can start to see where you may be storing it
bull Filing cabinets
bull Spreadsheets
bull Folders
bull Ledgers Lists
bull Piles of papers on your desk
They all store information and so too does a database Because of the mechanicalnature of databases they have terrific power to manage and process the information
3
they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)
Chapter 1 4
Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties
bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information
bull A database is logical coherent and internally consistent
bull A database is designed built and populated with data for a specific purpose
bull Each data item is stored in a field
bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee
A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information
Fig 21
httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image
5
Fig 22
httpscnxorgcontentm28150latest
Chapter 2 6
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
they hold This can make the information they house much more useful for your workWith this understanding of data we can start to see how a tool with the capacity tostore a collection of data and organize it especially for rapid search retrieval andprocessing might make a difference to how we can use data This is all aboutmanaging information Source http cnxorgcontentm28150latest (httphttp20cnxorgcontentm28150latest)
Chapter 1 4
Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties
bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information
bull A database is logical coherent and internally consistent
bull A database is designed built and populated with data for a specific purpose
bull Each data item is stored in a field
bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee
A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information
Fig 21
httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image
5
Fig 22
httpscnxorgcontentm28150latest
Chapter 2 6
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 2 Fundamental ConceptsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database is a shared collection of related data which is used to support the activitiesof a particular organization A database can be viewed as a repository of data that isdefined once and then is accessed by various users A database has the followingproperties
bull It is a representation of some aspect of the real world or perhaps a collection ofdata elements (facts) representing real-world information
bull A database is logical coherent and internally consistent
bull A database is designed built and populated with data for a specific purpose
bull Each data item is stored in a field
bull A combination of fields makes up a table For example an employee tablecontains fields that contain data about an individual employee
A Database Management System (DBMS) is a collection of programs that enable usersto create maintain databases and control all the access to the databases The primarygoal of the DBMS is to provide an environment that is both convenient and efficientfor users to retrieve and store information
Fig 21
httpscnxorgcontentm28150latest With the database approach we can have thetraditional banking system as shown in the following image
5
Fig 22
httpscnxorgcontentm28150latest
Chapter 2 6
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Fig 22
httpscnxorgcontentm28150latest
Chapter 2 6
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 3 Characteristics andBenefits of a Database
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a number of characteristics that distinguish the database approach with thefile-based approach
31 Self-Describing Nature of a Database SystemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A Database System contains not only the database itself but also the descriptions ofdata structure and constraints (meta-data) This information is used by the DBMSsoftware or database users if needed This separation makes a database systemtotally different from the traditional file-based system in which the data definition is apart of application programs
32 Insulation between Program and DataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the file based system the structure of the data files is defined in the applicationprograms so if a user wants to change the structure of a file all the programs thataccess that file might need to be changed as well On the other hand in the databaseapproach the data structure is stored in the system catalog not in the programsTherefore one change is all thatrsquos needed
33 Support multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view is a subset of the database which is defined and dedicated for particular usersof the system Multiple users in the system might have different views of the systemEach view might contain only the data of interest to a user or a group of users
34 Sharing of data and Multiuser systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A multiuser database system must allow multiple users access to the database at thesame time As a result the multiuser DBMS must have concurrency control strategies
7
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
to ensure several users access to the same data item at the same time and to do so ina manner that the data will always be correct ndash data integrity
35 Control Data RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the Database approach ideally each data item is stored in only one place in thedatabase In some cases redundancy still exists so as to improve system performancebut such redundancy is controlled and kept to minimum
36 Data SharingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The integration of the whole data in an organization leads to the ability to producemore information from a given amount of data
37 Enforcing Integrity ConstraintsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
DBMSs should provide capabilities to define and enforce certain constraints such asdata type data uniqueness etc
38 Restricting Unauthorised AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Not all users of the system have the same accessing privileges DBMSs should providea security subsystem to create and control the user accounts
39 Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
System data (Meta Data) descriptions are separated from the application programsChanges to the data structure is handled by the DBMS and not embedded in theprogram
Chapter 3 8
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
310 Transaction ProcessingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The DBMS must include concurrency control subsystems to ensure that several userstrying to update the same data do so in a controlled manner The results of anyupdates to the database must maintain consistency and validity
311 Providing multiple views of dataAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A view may be a subset of the database Various users may have different views of thedatabase itself Users may not need to be aware of how and where the data they referto is stored
312 Providing backup and recovery facilitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If the computer system fails in the middle of a complex update process the recoverysubsystem is responsible for making sure that the database is restored to the stage itwas in before the process started executing
313 Managing informationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Managing information means taking care of it so that it works for us and is useful forthe work we are doing The information we collect is no longer subject to ldquoaccidentaldisorganizationrdquo and becomes more easily accessible and integrated with the rest ofour work Managing information using a database allows us to become strategic usersof the data we have
We often need to access and re-sort data for various uses These may include
bull Creating mailing lists
bull Writing management reports
bull Generating lists of selected news stories
bull Identifying various client needs
The processing power of a database allows it to manipulate the data it houses so itcan
9
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull Sortbull Matchbull Linkbull Aggregatebull Skip fieldsbull Calculatebull Arrange
So a database tracks data finding the bits you need and processing them in the wayyou arrange for them to be processed
bull Because of the versatility of databases we find them powering all sorts ofprojects A database can be linked to
bull A web site that is capturing registered users
bull A client tracking application for social service organisations
bull A medical record system for a health care facility
bull Your personal address book in your e-mail client
bull A collection of word processed documents
bull A system that issues airline reservations
Source httpcnxorgcontentm28150latest (httpcnxorgcontentm28150latest)
Chapter 3 10
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 4 Types of Database Models
41 High-level Conceptual Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that are close to the way people perceive data to present the data Atypical example of this type is the entity relationship model which uses main conceptslike entities attributes relationships An entity represents a real-world object such asan employee a project An entity has some attributes which represents properties ofentity such as employeersquos name address birthdate A relationship represents anassociation among entities for example an employee works on many projects Therelationship exists between employee and project
411 Record-based Logical Data modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Provide concepts that can be understood by the user but not too far from the waydata is stored in the computer Three well-known data models of this type arerelational data model network data model and hierarchical data model
bull The Relational model represents data as relations or tables
bull The Network model represents data as record types and also represents a limitedtype of one to many relationship called set type
Fig 41 httpcnxorgcontentm28150latest
The Hierarchical model represents data as a hierarchical tree structures Eachhierarchy represents a number of related records Here is the schema in hierarchicalmodel notation
11
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Fig 42 httpcnxorgcontentm28150latest
Chapter 4 12
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 5 Data ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data Modelling is the first step in the process of database design This step issometimes considered as a high-level and abstract design phase (conceptual design)The aims of this phase is to
bull Describe what data is contained in the database (eg entities students lecturerscourses subjects etc)
bull Describe the relationships between data items (eg Students are supervised byLecturers Lecturers teach Courses )
bull Describe the constraints on data (eg Student Number has exactly 8 digits asubject has 4 or 6 unit of credits only)
The data items the relationships and constraints are all expressed using the conceptsprovided by the high-level data model Because these concepts do not include theimplementation details the results of the data modelling process is a (semi) formalrepresentation of the database structure This result is quite easy to understand so itis used as reference to make sure that all the userrsquos requirements are met The thirdstep is Database Design During this step we might have two sub-steps calledDatabase Logical Design which define a database in a data model of a specific DBMSand Database Physical Design which define the internal database storage structurefile organization or indexing techniques The last two steps shown are DatabaseImplementation and OperationsInterfaces Building These focus on creating aninstance of the schema and implementing operations and user interfaces In thedatabase design phases data is represented using a certain data model The DataModel is a collection of conceptual concepts or notations for describing data datarelationships data semantics and data constraints Most data models also include aset of basic operations for manipulating data in the database
51 Degree of Data AbstractionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In this section we will look at the database design process in terms of specificity Justas any design starts at a high level and proceeds to an ever-increasing level of detail sodoes database design For example when building a home you start with how manybedrooms the home will have how many bathrooms whether the home will be onone level or multiple levels etc The next step is to get an architect to design the homefrom a more structured perspective This level gets more detailed with respect toactual room sizes how the home will be wired where the plumbing fixtures will beplaced etc The last step is to hire a contractor to build the home Thatrsquos looking at thedesign from a high level of abstraction to an increasingly detailed level of abstraction
13
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
The database design is very much like that It starts with users identifying the businessrules database designers and analysts creating the database design and then thedatabase design is physically created using a DBMS
52 External modelsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Is the userrsquos view of the database
bull contains multiple different external views
bull is closely related to the real world as perceived by each user
53 Conceptual modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull These models provide a flexible data structuring capabilities
bull Community view the logical structure of the entire database
bull Contains lsquoWhatrsquo data is stored bull
bull Shows relationships among data
bull constraints
bull semantic information (business rules ndash discussed later)
bull security amp integrity information
bull A database is considered as a collection of entities (objects) of variouskinds
bull Basis for identification and high-level description of main data objects avoidsdetails
bull They are database independent It does not matter what database you will beusing
54 Internal modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Database is considered as a collection of fixed ndash size record
bull This model is closer to the physical level or file structure
bull Is a representation of database as seen by the DBMS
Chapter 5 14
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull Requires the designer to match the conceptual modelrsquos characteristics andconstraints to those of the selected implementation model
bull The entities in the conceptual model are mapped to the tables in the relationalmodel
bull The three most well-known models of this kind are the relational data model thenetwork data model and the hierarchical data model
55 Physical modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Physical representation of the database
bull Lowest level of abstractions bull
bull It is how the data is stored dealing with
bull Run-time performance
bull Storage utilization amp compression
bull File organization amp access methods
bull Data encryption
bull Physical level ndash managed by OS
bull Provide concepts that describe the details of how data is stored in the computerrsquosmemory
551 Data Abstraction LayerAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In a pictorial view you can see how the different models work together Letrsquos look atthis from the highest level the External Model
The External Model is the end user view of the data Typically a database is anenterprise system that serves the needs of multiple departments However onedepartment is not interested in seeing the other departmentsrsquo data Eg HumanResource does care to view the Sales data Therefore one user view will differ fromanother
The External model requires that the designer subdivide a set of requirements andconstraints into functional modules that can be examined within the framework oftheir external models (ie HR vs Sales)
As a data designer you need to understand all the data so that you can build anenterprise wide database Based on the needs of various departments the conceptualmodel is the first model created
15
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
At this stage the conceptual model is independent of both software and hardware Itdoes not depend on the DBMS software used to implement the model It does notdepend on the hardware used in the implementation of the model Changes in eitherhardware or DBMS software have no effect on the database design at the conceptuallevel
Once a DBMS is selected you can then implement it This is the Internal model Hereyou create all the tables constraints keys rules etc This is often referred to as theLogical design
The physical model is simply the way the data is stored on disk Each database Vendorwill have their own way of storing the data
Fig 51
552 SchemasAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A schema is an overall description of a database and it is usually represented by theEntity Relationship diagram There are multiple subschemas which arerepresentations of the external models To put it into perspective of one database
bull External Schemas
bull multiple multiple subschemas = multiple external views of the data bull
bull Conceptual Schema one only
bull Data items relationships amp constraints ndash ER diagram
bull Physical Schema one only
bull Record definitions representation methods
bull Access methods indexes hashing
Chapter 5 16
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
553 Logical and Physical Data IndependenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data independence refers to the immunity of user applications to changes made inthe definition and organization of data Data abstractions expose only those itemsthat are important or pertinent to the user Complexity is hidden from the databaseuser
Physical data independence deals with hiding the details of the storage structure fromuser applications The application should not be involved with these issues sincethere is no difference in the operation carried out against the data
The data independence and operation independence together gives the feature ofdata abstraction There are two types of data independence Logical and Physical
Source httpcnxorgcontentm28150latest
554 Logical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The ability to change the logical (conceptual) schema without changing the Externalschema (User View) is called logical data independence For example the addition orremoval of new entities attributes or relationships to the conceptual schema shouldbe possible without having to change existing external schemas or having to rewriteexisting application programs
In other words if the logical schema changed ie changes to the structure of thedatabase like adding a columns or other tables should not affect the functioning ofthe application (external views)
Source httpcnxorgcontentm28150latest
555 Physical data independenceAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This refers to the immunity of the internal model to changes in the physical modelThe logical schema stays unchanged even though changes are made to fileorganization or storage structures storage devices or indexing strategy
Source httpcnxorgcontentm28150latest
17
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 6 Classification of DatabaseSystems
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The database management systems can be classified based on several criteria
61 Based on data modelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The most popular data model in use today is the relational data model Well knownDBMSs like Oracle MS SQL Server DB2 MySQL support this model Other traditionalmodels can be named hierarchical data model or network data model In the recentyears we are getting familiar with object-oriented data models but these models havenot had widespread use Some examples of Object-oriented DBMSs are O2ObjectStore or Jasmine
62 Based on the number of usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can have a single user database system which supports one user at a time ormultiuser systems which support multiple users concurrently
63 Based on the ways database is distributed
631 Centralized SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
With centralized database systems the system is stored at a single site
Fig 61 Source httpcnxorgcontentm28150latest
Chapter 6 18
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
632 Distributed database systemAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Actual database and DBMS software are distributed in various sites connected by acomputer network
Fig 62 Source httpcnxorgcontentm28150latest
633 Homogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Use the same DBMS software at multiple sites Data is exchanged between varioussites and can be handled easily
634 Heterogeneous distributed Database SystemsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Different sites might use different DBMS software There is additional software tosupport data exchange between sites
Source httpcnxorgcontentm28150latest
19
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 7 The Relational Data ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This data model is introduced by CFCodd in 1970 Currently it is considered as themost widely used data model
The relational model has provided the basis for
bull Research on theory of datarelationshipconstraint
bull Numerous database design methodologies
bull The standard database access language SQL
bull Almost all modern commercial database management systems
The relational data model describes the world as ldquoa collection of inter-relatedrelations (or tables) ldquo
71 Fundamental concepts in Relational Data Model
711 RelationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A relation is a subset of the Cartesian product of a list of domains characterized by aname Given n domains denoted by D1 D2 hellip Dn r is a relation defined on thesedomains if rD1timesD2timeshelliptimesDn A relation can be viewed as a ldquotablerdquo In that table eachrow represents a tuple of data values and each column represents an attribute
Source httpcnxorgcontentm28250latest
712 TableAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
To put it simply a table holds the data A database is composed of multiple tables Theone below shows a database with three tables
Chapter 7 20
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Fig 71
713 ColumnAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A database stores pieces of information or facts in an organised way Understandinghow to use and get the most out of databases requires us to understand that methodof organisation
The principal storage units are called columns or fields or attributes These willhouse the basic components of data that your content can be broken down intoWhen deciding which fields to create you need to think generically about yourinformation For example drawing out the common components of the informationthat you will store in the database and avoiding the specifics that distinguish one itemfrom another
Look at the following example of an ID card to see the relationship between fields andtheir data
714 DomainAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A domain is the original sets of atomic values used to model data By atomic we meanthat each value in the domain is indivisible as far as the relational model is concernedFor example
21
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull The domain of Marital status has a set of possibilities Married Single Divorced
bull The domain of day Shift has the set of all possible days Mon Tue Wedhellip
bull The domain of Salary is the set of all floating-point numbers greater than 0 andless than 200000
bull The domain of First Name is the set of character strings that represents names ofpeople
In summary a Domain is a set of acceptable values that a column is allowed tocontain This is based on various properties and the data type for the column We willdiscuss data types in another chapter
715 RecordsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Just as the content of any one document or item needs to be broken down into itsconstituent bits of data for storage in the fields the link between them also needs tobe available so that they can be re-constituted into their whole form Records allow usto do this A tuple is another term used for records
A simple table gives us the clearest picture of how records and fields work together inthe database storage project Records and fields form the basis of all databases
Fig 72
The example above shows us how fields can hold a range of different sorts of data ndashthis one has
bull An ID field ordinal number integer
bull A Pubdate field daymonthyear date
bull An author field Initial Surname text
bull A title field text bull
bull A body text field text
By creating fields especially for the sorts of data they will house the database can sortthem intelligently You can request a selection of records which are limited by theirdate ndash all before a given date or all after a given date or all between 2 given dates
Chapter 7 22
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Similarly you can choose to have records sorted by date What is happening here isthat the database is reading the information in the date field not just as numbersseparated by slashes but rather as dates which must be ordered according to acalendar system ndash because the field was set up as a date field You are commandingthe database to sift through its data and organise it in a particular way
72 DegreeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The degree is the number of attributes In our example above the degree is 4
73 Properties of TablesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Table has a name that is distinct from all other tables in the database
bull There are no duplicate rows each row is distinct
bull Entries in columns are atomic (no repeating groups or multivalued attributes)
bull Entries from columns are from the same domain based on their data type
number (numeric integer float smallinthellip)
character (string)
date
logical (true or false)
bull Operations combining different data types are disallowed
bull Each attribute has a distinct name
bull sequence of columns is insignificant
bull sequence of rows is insignificant
731 TerminologyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We have used many different terms Here is a summary Alternative 1 is the mostcommon
23
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 7 24
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 8 Entity Relationship ModelAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The Entity Relationship Data Model (ER) has existed for over 35 years
The ER model is well-suited to data modelling for use with databases because it isfairly abstract and is easy to discuss and explain ER models are readily translated torelations
ER modelling is based on two concepts
bull Entities that is things Eg Prof Ba Course Database System
bull Relationships that is associations or interactions between entities
Eg Prof Ba teaches course Database Systems ER models (or ER schemas) arerepresented by ER diagrams
The example database
For the next section we will look at a sample database called the COMPANY databaseto illustrate the concepts of Entity Relationship Model
This database contains the information about employees departments and projects
bull There are several departments in the company Each department has a uniqueidentification a name location of the office and a particular employee whomanages the department
bull A department controls a number of projects each of which has unique name aunique number and the budget
bull Each employee has name an identification number address salary birthdate Anemployee is assigned to one department but can join in several projects
bull We need to record the start date of the employee in each project We also needto know the direct supervisor of each employee We want to keep track of thedependents of the employees Each dependent has name birthdate andrelationship with the employee
81 Entity Entity Set and Entity TypeAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An entity is an object in the real world with an independent existence and can bedifferentiated from other objects
An entity might be
bull An object with physical existence Eg a lecturer a student a car
25
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull An object with conceptual existence Eg a course a job a position
An Entity Type defines a collection of similar entities
An Entity Set is a collection of entities of an entity type at a point of time In ERdiagrams an entity type is represented by a name in a box
Fig 81 Source httpcnxorgcontentm28250latest
811 Types of EntitiesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Independent entities also referred to as Kernels are the backbone of the database Itis what other tables are based on Kernels have the following characteristics
bull they are the lsquobuilding blocksrsquo of a database
bull the primary key may be simple or composite
bull the primary key is not a foreign key
bull they do not depend on another entity for their existence
Example Customer table Employee table Product tableDependent Entities also referred to as derived depend on other tables for theirmeaning
bull Dependent entities are used to connect two kernels together
bull They are said to be existent dependent on two or more tables
bull Many to many relationships become associative tables with at least two foreignkeys
bull They may contain other attributes
bull The foreign key identifies each associated table
bull There are three options for the primary key
bull use a composite of foreign keys of associated tables if unique
bull use a composite of foreign keys and qualifying column
bull create a new simple primary key
Characteristic entities provide more information about another table These entitieshave the following characteristic
Chapter 8 26
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull They represent multi-valued attributes
bull They describe other entities
bull They typically have a one to many relationship
bull The foreign key is used to further identify the characterized table
bull Options for primary key are as follows
bull foreign key plus a qualifying column
bull or create a new simple primary key
bull Employee(EID Name Address Age Salary)
bull EmployeePhone(EID Phone)
812 AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Each entity is described by a set of attributes Eg Employee = (Name Address AgeSalary)
Each attribute has a name associated with an entity and is associated with a domainof legal values However the information about attribute domain is not presented onthe ER diagram
In the diagram each attribute is represented by an oval with a name inside
Fig 82 Source httpcnxorgcontentm28250latest
813 Types of AttributesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are a few types of attributes you need to be familiar with Some of these are tobe left as is but some need to be adjusted to facilitate representation in the relationalmodel This first section will discuss the types of attributes Later on we will discussfixing the attributes to fit correctly into the relational model
Simple attributes are attributes that are drawn from the atomic value domains
eg Name = John Age = 23 also called Single valued Compositeattributes Attributes that consist of a hierarchy of attributes
27
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
eg Address may consists of ldquoNumberrdquo ldquoStreetrdquo and ldquoSuburbrdquo rarr Address = 59 +lsquoMeek Streetrsquo + lsquoKingsfordrsquo
Fig 83 Source httpcnxorgcontentm28250latest
Multivalued attributes Attributes that have a set of values for each entity
eg Degrees of a person lsquo BScrsquo lsquoMITrsquo lsquoPhDrsquo
Fig 84 Source httpcnxorgcontentm28250latest
Derived attributes Attributes contain values that are calculated from other attributes
eg Age can be derived from attribute DateOfBirth In this situation DateOfBirth mightbe called Stored Attribute
Fig 85 Source httpcnxorgcontentm28250latest
Chapter 8 28
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
814 KeysAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
An important constraint on the entities is the key The key is an attribute or a group ofattributes whose values can be used to uniquely identify an individual entity in anentity set
8141 Candidate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull a simple or composite key that is unique and minimal
bull unique ndash no two rows in a table may have the same value at any time
bull minimal ndash every column must be necessary for uniqueness
bull For example for the entity
Employee(EID First Name Last Name SIN Address Phone BirthDate SalaryDepartmentID)
Possible candidate keys are EID SIN
FirstName and Last Name ndash assuming there is no one else in the company with thesame name Last Name and DepartmentID ndash assuming two people with the same lastname donrsquot work in the same depart ment
EID SIN are also candidate keys
8142 Composite key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Composed of more than one attribute
bull For example as discussed earlier
First Name and Last Name ndash assuming there is no one else in the company with thesame name Last Name and Department ID ndash assuming two people with the same lastname donrsquot work in the same department
bull A composite key can have two or more attributes but it must be minimal
29
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
8143 Primary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull A candidate key is selected by the designer to uniquely identify tuples in a table Itmust not be null
bull A key is chosen by the database designer to be used as an identifying mechanismfor the whole entity set This is referred to as the primary key This key isindicated by underlining the attribute in the ER model
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -EID is the Primary key
8144 Secondary key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull an attribute used strictly for retrieval purposes (can be composite) for examplePhone number Last Name and Phone number etc
bull all other candidate keys not chosen as the primary key
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key all othercandidate keys not chosen as the primary key
8145 Alternate key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull all other candidate keys not chosen as the primary key
8146 Foreign key
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An attribute in one table that references the primary key of another table OR itcan be null
bull Both foreign and primary keys must be of the same data type
Chapter 8 30
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull For example Employee(EID First Name Last Name SIN Address PhoneBirthDate Salary DepartmentID) -DepartmentID is the Foreign key
8147 Nulls
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
This is a special symbol independent of data type which means either unknown orinapplicable (it does not mean zero or blank)
bull No data entry
bull Not permitted in primary key
bull Should be avoided in other attributes
bull Can represent
bull An unknown attribute value
bull A known but missing attribute value
bull A ldquonot applicablerdquo condition
bull Can create problems when functions such as COUNT AVERAGE and SUMare used
bull Can create logical problems when relational tables are linked
NOTE the result of a comparison operation is null when either argument is null Theresult of an arithmetic operation is null when either argument is null (except functionswhich ignore nulls)
PROBLEM Find all employees (EMP) in Sales whose salary plus commission isgreater than 30000
SELECT emp FROM salary_tbl
WHERE jobname = lsquoSalesrsquo AND
(commission + salary) gt 30000 ndashgt E10 and E12
This result does not include E13 because of the Null value in Commission To ensurethe row with the null value is included we need to look at the individual fields Byadding commission and Salary for employee E13 the result will be a null value Thesolution is shown below
31
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
SELECT emp FROM salary_tbl
WHERE jobName = lsquoSalesrsquo AND
(commission gt 30000 OR salary gt 30000 OR
(commission + salary) gt 30000 ) ndashgtE10 and E12 and E13
815 Entity Types
8151 Existence Dependency
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An entityrsquos existence is dependent on the existence of the related entity
bull Itrsquos existence-dependent if it has a mandatory foreign key That is a foreign keyattribute that cannot be null
bull Spouse entity is existence dependent on the employee
8152 Weak Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These tables are existence dependentThey cannot exist without entity with which it has a relationshipPrimary key is derived from the primary key of the parent entity
bull The spouse table is a weak entity because its PK is dependent on theemployee table Without a corresponding employee record the spouserecord could not exist
8153 Strong Entities
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull If an entity can exist apart from all of its related entities itrsquos considered a strongentity
bull Kernels are strong entities
bull A table without a foreign key is or a table that contains a foreign key which cancontain NULLS is a strong entity
Chapter 8 32
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
816 RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are the glue that holds the tables together They are used to connect togetherrelated information in other tables
8161 1M relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be the norm in any relational database design
bull Relational database norm
bull Found in any database environment
eg One department has many employees
Fig 86 Source httpcnxorgcontentm28250latest
8162 11 relationship
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Should be rare in any relational database design
bull One entity can be related to only one other entity and vice versa
bull Could indicate that two entities actually belong in the same table
eg A professor chairs one department and a department has only one chair
8163 11 Example
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull An employee is associated with one spouse and one spouse is associated withone employee
33
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull Cannot be implemented as such in the relational model
bull MN relationships can be changed into two 1M relationships
bull Can be implemented by breaking it up to produce a set of 1M relationships
bull Can avoid problems inherent to MN relationship by creating a composite entityor bridge entity
bull A student has many classes and a class can hold many students
bull Implementation of a composite entity
bull Yields required MN to 1M conversion
bull Composite entity table must contain at least the primary keys of original tables
bull Linking table contains multiple occurrences of the foreign key values
bull Additional attributes may be assigned as needed
8164 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping M-N Binary Relationship Type For each M-N Binary Relationship identify tworelations A B represent two entity type participating in R Create a new relation S torepresent R Include in S as foreign keys the primary keys of A and B and all the simpleattributes of R The combination of primary keys of A and B will make the primary keyof S
8165 MN relationships
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 87 Source httpcnxorgcontentm28250latest
Chapter 8 34
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
817 Unary Relationships (recursive)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One in which a relationship exists between occurrences of the same entity set
The two keys (primary key and foreign key) are the same but they represents twoentities of different roles relate to this relationship
In some entities a separate column can be created that refers to the primary key ofthe same entity set
Fig 88 Source httpcnxorgcontentm28250latest
818 Ternary RelationshipsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Mapping Ternary Relationship Type For each n-ary ( gt 2 ) Relationships create a newrelation to represent the relationship The primary key of the new relation is thecombination of the primary keys of the participating entities that hold the N (many)side In most cases of an n-ary relationship all the participating entities hold a manyside Source httpcnxorgcontentm28250latest
Fig 89 Source httpcnxorgcontentm28250latest
35
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
819 Relationship StrengthAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Relationship strength is based on how the primary key of a related entity is defined
Weak Relationships (non-identifying relationship)
bull Exists if the PK of the related entity does not contain a PK component of theparent entity
bull Customer(custid custname)
bull Order(orderID custid date)
Strong Relationship (identifying)
bull Exists when the PK of the related entity contains the PK component of the parententity
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTimehellip)
Chapter 8 36
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 9 Integrity Rules andConstraints
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Constraints are a very important feature in a relational model In fact the relationalmodel supports well defined theory of constraints on attributes or tables Constraintsare useful because they allow a designer to specify the semantics of data in thedatabase and constraints are the rules to enforce DBMSs to check that data satisfiesthe semantics
91 Domain IntegrityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Domain restricts the values of attributes in the relation and it is a constraint of therelational model However there are real ndashworld semantics on data that cannotspecified if used only with domain constraints We need more specific ways to statewhat data values areare not allowed and what format is suitable for an attributes Forexample the employee ID must be unique the employee birthday is in the range [Jan1 1950 Jan 1 2000] Such information is provided in logical statements called integrityconstraints
There are several kinds of integrity constraints
Entity Integrity ndash Every table requires a primary key The primary key nor any part ofthe primary key can contain NULL values This is because NULL values for the primarykey means we cannot identify some rows For example in the EMPLOYEE table Phonecannot be a key since some people may not have a phone
Referential integrity ndash a foreign key must have a matching primary key or it must benull
This constraint is specified between two tables (parent and child) it maintains thecorrespondence between rows in these tables It means the reference from a row inone table to other table must be valid Examples of Referential integrity constraint
92 Referential integrity ExamplesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In the CustomerOrder database
bull Customer(custid custname)
bull Order(orderID custid OrderDate)
37
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
To ensure that there are no orphan records we need to enforce referential integrityAn orphan record is one whose foreign key value is not found in the correspondingentity ndash the entity where the PK is located Recall that a typical join is between a PK andFKThe referential integrity constraint states that the CustID in the Order table mustmatch a valid CustiD in the Customer table Most relational databases have declarativereferential integrity In other words when the tables are created the referentialintegrity constraints are set upIn the CourseClass database
bull Course(CrsCode DeptCode Description)
bull Class(CrsCode Section ClassTime)
The referential integrity constraint states that CrsCode in the Class table must match avalid CrsCode in the Course table In this situation itrsquos not enough that the CrsCodeand Section in the Class table make up the PK we must also enforce referentialintegrity
When setting up referential integrity it is important that the PK and FK have the samedata types and come from the same domain Otherwise the RDBMS will not allow thejoin
93 Referential Integrity in MS AccessAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
In MS Access referential integrity is set up by joining the PK in the Customer table tothe CustID in the Order table
94 Referential Integrity using Transact SQL (MS SQLServer)
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
CREATE TABLE Customer
Chapter 9 38
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
( CustID INTEGER PRIMARY KEY CustName CHAR(35) )
CREATE TABLE Orders
( OrderID INTEGER PRIMARY KEY CustID INTEGER REFERENCES Customer(CustID)OrderDate DATETIME )
The referential integrity is set when creating the table (Orders) with the FK
95 Foreign Key RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Additional foreign key rules may be added such as what to do with the child rows(Orders table) when the record with the PK ndash the parent (Customer) is deleted orchanged (updated) The relationship window in Access shows two additional optionsfor foreign keys rules Cascade Update and Cascade Delete If they are not selectedthe system would prevent the deletion or update of PK values in the parent table (Customer) if a child record exists The child record is any record with a matching PK
DELETE
bull RESTRICT
bull CASCADE
bull SET TO NULL
UPDATE
bull RESTRICT
bull CASCADE
In some databases an additional option exists when selecting the Delete option Thatis lsquoSet to Nullrsquo In these situations the PK row is deleted but the FK in the child table isset to Null Though this creates an orphan row it is acceptable
Enterprise Constraints ndash sometimes referred to as Semantic constraints They areadditional rules specified by users or database administrators ie A class can have amaximum of 30 students A teacher can teach a maximum of 4 classes a semester Anemployee cannot take a part in more than 5 projects Salary of an employee cannotexceed the salary of the employeersquos manager
951 Business RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Another term wersquove used is semantics Business rules are obtained from users whengathering requirements The requirements gathering process is very important andshould be verified by the user before the database design is built If the business rules
39
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
are incorrect the design will be incorrect and ultimately the application built will notfunction as expected by the users
Some examples of business rules are
bull A teacher can teach many students
bull A class can have a maximum of 35 students
bull A course can be taught many times but by only one instructor
bull Not all teachers teach classes etc
96 CardinalityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Expresses minimum and maximum number of entity occurrences associated with oneoccurrence of related entity Business rules are used to determine cardinality
961 Relationship ParticipationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
There are two sides to each of the relationships when it comes to participation Eachside can have two options for participation It is either 0 (zero) 1 (one) or many Theouter most symbol represents the con nectivity ie one to many would berepresented by
The inner most symbols represent the cardinality which indicates the minimumnumber of instances in the corresponding entity ie on the right hand side it is readas minimum 1 maximum many On the left hand side itrsquos read as minimum 1 andmaximum 1 See below for more examples
Chapter 9 40
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
97 OptionalAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence does not require a corresponding entity occurrence in arelationship
This shows a zero or many The many side is optional
Fig 91
This can also be read as
Right hand side A customer can be in a minimum of 0 orders or a maximum of manyorders Left hand side The order entity must contain a minimum of one relatedentity in the customer table and a maximum of 1 related entity
This shows a zero or one The 1 side is optional
98 MandatoryAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One entity occurrence requires a corresponding entity occurrence in a relationship
This example shows one or many The many side is mandatory
41
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Fig 92
This example shows one or many The many side is mandatory
Fig 93
So far we have seen that the right hand side can have
a 0 (zero) cardinality and a connectivity of many or one
The left hand side can also have a cardinality
of 0 (zero)
However it cannot have a connectivity of 0 only 1 (as shown) The connectivitysymbols show maximums So if you think about it logically if the connectivity symbolon the left hand side shows 0 then there would be no connection between the tablesThe way to read the left hand side is The CustID in the Order table must be found inthe Customer table a minimum of 0 and a maximum of 1 time The 0 means that theCustID in the Order table may be null The left most 1 (right before the 0 representingconnectiv ity) says that if there is a CustID in the Order table it can only be in theCustomer table once When you see the 0 symbol for cardinality you can assume two
Chapter 9 42
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
things The FK in the Order table allows nulls and the FK is not part of the PK since PKsmust not contain null values
Fig 94
981 Relationship TypesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
You may have noticed in the previous image is the dashed line that connects the twotables This is referred as the relationship type Itrsquos either identifying or non-identifying Please see the section that discussed weak and strong relationships formore explanation The only thing you need to understand for this section is the waythe relationship is depicted in the ERD Identifying relationships will have a solid line(PK contains the FK) The non-Identifying relationship does not contain the FK in thePK
Fig 95
43
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 10 ER ModellingAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
One important theory developed for the relational model involves the notion offunctional dependency (fd) Like constraints functional dependencies are drawn fromthe semantics of the application domain Essentially fdrsquos describe how individualattributes are related Functional dependencies are a kind of constraint amongattributes within a relation and that have implications for ldquogoodrdquo relational schemadesign Here we will look at
bull basic theory and definition of functional dependencies
bull methodology for improving schema designs (normalization)
The aim of studying this is to improve understanding of relationships among data andto gain enough formalism to assist practical database design
101 Relational Design and RedundancyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Generally a good relational database design must capture all of the necessaryattributesassociations and should do this with a minimal amount of storedinformation (it means there is no redundant data)
In database design redundancy is generally a ldquobad thingrdquo because it causes problemsmaintaining consistency after updates However it can sometimes lead toperformance improvements eg may be able to avoid a join to collect bits of datatogether
Consider the following table defining bank accountsbranches
Fig 101 Source httpcnxorgcontentm28252latest
Chapter 10 44
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
102 Insertion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When we insert a new record we need to check that branch data is consistent withexisting rows
Fig 102 Source httpcnxorgcontentm28252latest
103 Update anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If a branch changes address we need to update all rows referring to that branch
Fig 103 Source httpcnxorgcontentm28252latest
104 Deletion anomalyAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If we remove information about the last account at a branch (Downtown) all of thebranch information disappears
45
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Fig 104 Source httpcnxorgcontentm28252latest
The problem is that after deleting the row we donrsquot know where the Downtownbranch is located and we lose all information regarding customer lsquo1313131rsquo To avoidthese kinds of updatedelete problems we need to decompose the original table intoseveral smaller tables where each table has minimal overlap with other tablesTypically each table contains information about one entity (eg branch customer hellip)
The Bank Accounts tables should appear as follows
Fig 105
This will ensure that when branch information is added or updated it will only affectone record When customer information is added or deleted the Branch informationwill not be accidently modified or incorrectly recorded
Source httpcnxorgcontentm28252latest
Another Example
Employee Project Table
Assumptions EmpID and ProjectID is a composite PK
Project ID determines Budget ie Project P1 has a budget of 32 hours
Fig 106
Chapter 10 46
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Letrsquos look at some possible anomalies Add row S8535P19) Problem Two tupleswith conflicting budgets Delete tuple S79 27 P3 1 Problem Deletes the budget ofproject P3 Update tuple S75 32 P1 7 to S75 35 P1 7) Problem Two tuples withdifferent values for project P1rsquos budget To fix this we need a table for employees anda table for projects
Fig 107
Tables with Data
Fig 108
1 No anomalies will be created if a budget is changed
2 No dummy values are needed for projects that have no employees assigned
3 If an employeersquos contribution is deleted no important data is lost
4 No anomalies are created if an employeersquos contribution is added
The best approach to creating tables without anomalies is to ensure tables arenormalized and thatrsquos accomplished by understanding functional dependencies (FD)FD ensures that all attributes in a table belong to that table In other words it willeliminate redundancies and anomalies
47
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 11 Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A functional dependency is a relationship between two attributes Typically betweenthe PK and other non-key attributes with in the table For any relation R attribute Y isfunctionally dependent on attribute X (usually the PK) if for every valid instance of Xthat value of X uniquely determines the value of Y
X mdashmdashmdashndashgt YThe left-hand side of the FD is called the determinant and the right-hand side is thedependentExamplesSIN mdashmdashmdash-gt Name Address BirthdateSIN determines names and address and birthdays Given SIN we can determine anyof the other at tributes within the table
Sin Course mdashmdashmdashgt DateCompletedSin and Course determine date completed This must also work for a composite PKISBN mdashmdashmdashndashgt TitleISBN determines title
111 Rules of Functional DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Consider the following instance r(R) of the relation schema R(ABCDE)
Fig 111
What kind of dependencies can we observe among the attributes in Table R
bull Since the values of A are unique it follows from the FD definition that
bull A rarrB A rarrC A rarrD A rarrE
Chapter 11 48
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull It also follows that A rarrBC (or any other subset of ABCDE)
bull This can be summarized as A rarrBCDE
bull From our understanding of primary keys A is a Primary Key
bull Since the values of E are always the same it follows that A rarrE B rarrE C rarrE D rarrEHowever we cannot generally summarized above by ABCD rarrE In general A rarrEB rarrE AB rarrE
Other observations
combinations of BC are unique therefore BC rarrADEcombinations of BD are unique therefore BD rarrACEif C values match so do D values therefore C rarrD however D values donrsquot determineC values so C does not determine D and D does not determine C
When looking at the data it makes a lot more sense in terms of which attributes aredependent and which are determinants
112 Inference RulesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Armstrongrsquos axioms are a set of axioms (or more precisely inference rules) used toinfer all the functional dependencies on a relational database They were developedby William W Armstrong
Let R(U) be a relation scheme over the set of attributes U We will use the letters X Y Zto represent any subset of and for short the union of two sets of attributes and byinstead of the usual X U Y
Source httpenwikipediaorgwikiArmstrong27s_axioms
1121 Axiom of reflexivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If Y is a subset of X then X determines Y
e g PartNo mdashgt NT123 ndash composed of category (NT) and partID (123)
X X Y
49
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
1122 Axiom of augmentationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull Also known as a Partial dependency
Studentno course mdashgt studentName address city prov pc grade dateCompleted
This situation is not desirable because every non key attribute has to be fullydependent on the PK In this situation Student information is only lsquopartiallyrsquo dependenton the PK StudentNo
To fix this problem we need to break down the table into two as followsStudentNo course agrave grade dateCompletedStudentNo agrave studentName address city prov pc
1123 Axiom of transitivityAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
If X determines Y and Y determines Z then X must also determine ZStudentNoagravestudentNameaddress city prov pc ProgramID ProgramNameX Y ZThis situation is not desirable because a non key attributes depends on another nonkey attributeTo fix this problem we need to break this table into two one to hold informationabout the student and the other to hold information about the program However westill need to leave a FK in the student table so that we can determine which programthe Student is enrolled in
StudentNomdashgt studentNameaddress city prov pc ProgramID
ProgramID mdashgt ProgramName
113 Additional rules
1131 UnionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 112
If X determines Y and X determines Z then X must also determines Y and Z
Chapter 11 50
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
This is suggesting that if two tables are separate and the PK is the same you maywant to consider putting them together
SIN mdashgt EmpName
SIN mdashgt SpouseName
You may want to join these two tables into one
SIN ndashgt EmpName SpouseName
Some DBAs would leave them separated for a couple of reasons They describes twoentities so should be separated If in one table the spouse name may be left NULLmost of the time so there is no need to have it in the same table
1132 DecompositionAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 113
If X determines Y and Z then X determines Y and X determines Z separately This is thereverse of Union If you have a table that appears to contain two entities that aredetermined by the same PK consider breaking them up into two tables
114 Dependency DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Fig 114
A dependency diagram illustrates the various dependencies that may exist in a nonnormalized table The following dependencies are identified
ProjectNo and EmpNo combined is the PK
Partial Dependencies
ProjectNo mdashgt ProjName
EmpNo mdashgt EmpName DeptNo HrsWork
Transitive Dependency
DeptNo mdashgt DeptName
51
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 12 NormalizationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization is the branch of relational theory providing design insights The goals ofnormalization are
bull be able to characterize the level of redundancy in a relational schema
bull provide mechanisms for transforming schemas to remove redundancy
Normalization draws heavily on the theory of functional dependencies Normalizationtheory defines six normal forms (NFs) Each normal form involves a set of dependencyproperties that a schema must satisfy and each gives guarantees about presenceabsence of update anomalies That means higher normal forms have less redundancyso less update problems
121 Normal FormsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
All the tables in any database can be in one of the normal forms we will discuss nextNormalization is the process of determining how much redundancy exists in yourtables Idealy we only want minimal redun dancy That is PK to FK Everything elseshould be derived from other tables There are 6 normal forms but we will only lookat the first 4 The last two are rarely used
1stNormal Form
2ndNormal Form
3rdNormal Form
BCNF
First Normal Form
In the first normal form only single values are permitted at the intersection of eachrow and column hence there are no repeating groups
To normalize a relation that contains a repeating group remove the repeating groupand form two new relations
The primary key of the new relation is a combination of the primary key of the originalrelation plus an attribute from the newly created relation for unique identification
Chapter 12 52
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
1211 School databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Grade_Report(Studentno student_name majorcourse_no course_nameinstructor_no instructor_name instructor_location grade)
12111 Process for 1NF
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull In the Student Grade Report table the repeating group is the course informationA student can take many courses
bull Remove the repeating group In this case itrsquos the Course information for eachstudent
bull Identify the Primary Key in your new table
bull Primary key must uniquely identify attribute value (StudentNo and CourseNo)
bull After removing all the attributes related to the Course and Student whatrsquos left isthe Student_course table Student (Student_nostudent_namemajor)
bull The Student table is in First Normal Form (repeating group removed)
Student (Student_no student_name major) Student_Course(Studentno course_nocourse_nameinstructor_noinstructor_name instructor_location grade)
12112 Update anomalies in 1st normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Student_Course(Studentno course_no course_name instructor_no instructor_nameinstructor_location grade)
bull To add a new course we need a student
bull When Course information needs to be updated we may have inconsistencies
bull To delete a student we might also delete critical information about a course
122 2nd Normal FormAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation must first be in 1stNF
53
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull The relation is automatically in 2ndNF if and only if the primary key comprises asingle attribute
bull If the relation has a composite primary key then each non-key attribute must befully dependent on the entire primary key and not on a subset of the primary key(ie there must be no partial dependency ndash augmentation)
1221 Process for 2NFAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull To move on to 2ndNF the table must be in 1stNF
bull The student table is already in 2ndNF because it has a single column PK
bull When examining the Student Course table we see that not all the attributes arefully dependent on the PK Specifically all course information The only attributethat is fully dependent is grade
bull Identify the new table that contains the course information
bull Identify the PK for the new table
Student (Studentnostudent_namemajor)
Course_Grade(studentnocourse_nograde)Course_Instructor(Course_no course_name instructor_no instructor_nameinstructor_location)
1222 Update anomalies in 2nd normal formAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull When adding a new instructor we need a course
bull Updating course information could lead to inconsistencies for instructorinformation
bull Deleting a course may also delete instructor information
123 3rd Normal Form
1231 Remove Transitive DependenciesAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
bull The relation has to be in 2nd normal form
Chapter 12 54
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull Remove all transitive dependencies ndash a non-key attribute may not be functionallydependent on another non-key attribute
bull Eliminate all dependent attributes in transitive relationship(s) from each of thetables that have a transitive relationship
bull Create new table(s) with removed dependency
bull Check new table(s) as well as table(s) modified to make sure that each table hasdeterminant and that no table contains inappropriate dependencies
Student (Studentno student_name major)Course_Grade (Studentno course_no grade)Course (Course_nocourse_nameinstructor_no)Instructor (Instructor_no instructor_name instructor_location)
12311 Update anomalies in 3rd normal form
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
At this stage there should be no anomalies in 3rd normal form
Lets look at the dependency diagrams for this example
The first step is to remove repeating groups As discussed above
Student (Student_no student_name major)
Student_Course (Studentno course_no course_nameinstructor_noinstructor_nameinstructor_location grade)
To recap the normalization process for the School database the followingdependencies are shown in the diagram
Fig 121
PD ndash Partial Dependency
TD ndash Transitive DependencyFD ndash Full Dependency
55
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
1232 Boyce-Codd Normal Form (BCNF)Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
When a table has more than one candidate key anomalies may result even thoughthe relation is in 3NF Boyce-Codd normal form is a special case of 3NF A relation is inBCNF if and only if every determinant is a candidate key
Consider the following table
St_Maj_Adv table
Semantic Rules
Each student may major in several subjectsFor each major a given student has only one advisorEach major has several advisorsEach advisor advises only one majorEach advisor advises several students in one major
Functional dependencies
Fd1 Student_idMajor mdashmdashgt Advisor (candidate Key)
Fd2 Advisor mdashmdashgt Major (not a candidate key)
ANOMALIES
Deleting student deletes advisor info
Insert a new advisor ndash need a student
Update ndash inconsistencies
Chapter 12 56
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Note no single attribute is a candidate key
Primary key can be student_idmajor or student_idadvisor
To reduce the St_Maj_Adv relation to BCNF create two new tables
1 St_Adv (Student_idAdvisor)
2 Adv_Maj (AdvisorMajor)
St_Adv
Adv_Maj
57
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
1233 BCNF Example 2Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Client Interview Table
Fd1 ndash ClientNo InterviewDate ndashgt interviewTime staffNo roomNo (PK)Fd2 ndash staffNo InterviewDate InterviewTime ndashgt ClientNO (CK)Fd3 ndash roomNo InterviewDate InterviewTime ndashgt StaffNo clientNo (CK)Fd4 ndash staffNo interviewDate ndashgt roomNoA relation is in BCNF if and only if every determinant is a candidate key We need to
create a table that incorporates the first 3 Fds and another table for the 4thFd
Interview
StaffRoom
Chapter 12 58
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Fig 122 Source httpdbgrussellorgsection008html
12331 Normalization and Database Design
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Normalization should be part of design process Make sure that proposed entitiesmeet required normal form before table structures are created Many real-worlddatabases have been improperly designed or burdened with anomalies if improperlymodified during the course of time You may be asked to redesign and modify existingdatabases This could be a really big endeavor if the tables are not properlynormalized
Use an Entity Relation Diagram (ERD) to provide a big picture The ERD shows a macroview of an organizationrsquos data requirements and operations The ER Diagram iscreated through an iterative process which involves identifying relevant entities theirattributes and their relationships
Normalization procedure focuses on characteristics of specific entities and itrepresents the micro view of entities within ERD
It is difficult to separate normalization process from ER modeling process The twotechniques should be used concurrently
59
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 13 Database DevelopmentProcess
Available under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A core aspect of software engineering is the subdivision of the development processinto a series of phases or steps each of which focuses on one aspect of thedevelopment The collection of these steps is sometimes referred to as a developmentlife cycle The software product moves through this life cycle (sometimes repeatedly asit is refined or redeveloped) until it is finally retired from use Ideally each phase in thelife cycle can be checked for correctness before moving on to the next phase
Fig 131 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
131 SDLC ndash WaterfallAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The waterfall figure above illustrates a general waterfall model which could apply toany computer system development It shows the process as a strict sequence of stepswhere the output of one step is the input to the next and all of one step has to becompleted before moving onto the next
We can use the waterfall process as a means of identifying the tasks that are requiredtogether with the input and output for each activity What is important is the scope ofthe activities which can be summarised as follows
bull Establishing requirements involves consultation with and agreement amongstakeholders as to what they want of a system expressed as a statement ofrequirements
Chapter 13 60
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
bull Analysis starts by considering the statement of requirements and finishes byproducing a system specification The specification is a formal representation ofwhat a system should do expressed in terms that are independent of how it maybe realized
bull Design begins with a system specification and produces design documents andprovides a detailed description of how a system should be constructed
bull Implementation is the construction of a computer system according to a givendesign document and taking account of the environment in which the system willbe operating (for example specific hardware or software available for thedevelopment) Implementation may be staged usually with an initial system thancan be validated and tested before a final system is released for use
bull Testing compares the implemented system against the design documents andrequirements specification and produces an acceptance report or more usually alist of errors and bugs that require a review of the analysis design andimplementation processes to correct (testing is usually the task that leads to thewaterfall model iterating through the life cycle)
bull Maintenance involves dealing with changes in the requirements or theimplementation environment bug fixing or porting of the system to newenvironments (for example migrating a system from a standalone PC to a UNIXworkstation or a networked environment) Since maintenance involves theanalysis of the changes required design of a solution implementation and testingof that solution over the lifetime of a maintained software system the waterfalllife cycle will be repeatedly revisited
httpcnxorgcontentm28139latest
132 Database Life CycleAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
We can use the waterfall cycle as the basis for a model of database developmentwhich incorporates three assumptions
bull We can separate the development of a database ndash that is specification andcreation of a schema to define data in a database ndash from the user processes thatmake use of the database
bull We can use the three-schema architecture as a basis for distinguishing theactivities associated with a schema
bull We can represent the constraints to enforce the semantics of the data oncewithin a database rather than within every user process that uses the data
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
61
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Fig 132 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Using these assumptions the diagram above represents a model of the activities andtheir outputs for database development It is applicable to any class of DBMS(database management system) not just a relational approach
httpcnxorgcontentm28139latest
Database Application Development is the process of obtaining real-worldrequirements analyzing requirements designing the data and functions of the systemand then implementing the operations in the system
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
133 Requirements gatheringAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
The first step is Requirements Gathering During this step the database designershave to interview the customers (database users) to understand the proposed systemobtain and document the data and functional requirements The result of this step is adocument including the detail requirements provided by the users
Establishing requirements involves consultation with and agreement among all theusers as to what persistent data they want to store along with an agreement as to themeaning and interpretation of the data elements The data administrator plays a keyrole in this process as they overview the business legal and ethical issues within theorganization that impact on the data requirements
The data requirements document is used to confirm the understanding ofrequirements with users To make sure that it is easily understood it should not beoverly formal or highly encoded The document should give a concise summary of allusersrsquo requirements ndash not just a collection of individualsrsquo requirements ndash as theintention is to develop a single shared database
Chapter 13 62
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
The requirements should not describe how the data is to be processed but ratherwhat the data items are what attributes they have what constraints apply and therelationships that hold between the data items
134 AnalysisAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Data analysis begins with the statement of data requirements and then produces aconceptual data model The aim of analysis is to obtain a detailed description of thedata that will suit user requirements so that both high and low level properties of dataand their use are dealt with These include properties such as the possible range ofvalues that can be permitted for attributes such as in the School Database examplefor instance the Student course code course title and credit points
The conceptual data model provides a shared formal representation of what is beingcommunicated between clients and developers during database development ndash it isfocused on the data in a database irrespective of the eventual use of that data in userprocesses or implementation of the data in specific computer environmentsTherefore a conceptual data model is concerned with the meaning and structure ofdata but not with the details affecting how they are implemented
The conceptual data model then is a formal representation of what data a databaseshould contain and the constraints the data must satisfy This should be expressed interms that are independent of how the model may be implemented As a resultanalysis focuses on lsquoWhat is requiredrsquo not lsquoHow is it achieved
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
135 Logical DesignAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Database design starts with a conceptual data model and produces a specification of alogical schema this will determine the specific type of database system (networkrelational object-oriented) that is required The relational representation is stillindependent of any specific DBMS it is another conceptual data mod el
We can use a relational representation of the conceptual data model as input to thelogical design process The output of this stage is a detailed relational specificationthe logical schema of all the tables and constraints needed to satisfy the descriptionof the data in the conceptual data model It is during this design activity that choicesare made as to which tables are most appropriate for representing the data in adatabase These choices must take into account various design criteria including forexample flexibility for change control of duplication and how best to represent theconstraints It is the tables defined by the logical schema that determine what data arestored and how they may be manipulated in the database
63
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Database designers familiar with relational databases and SQL might be tempted togo directly to implementation after they have produced a conceptual data modelHowever such a direct transformation of the relational representation to SQL tablesdoes not necessarily result in a database that has all the desirable propertiescompleteness integrity flexibility efficiency and usability A good conceptual datamodel is an essential first step towards a database with these properties but that doesnot mean that the direct transformation to SQL tables automatically produces a gooddatabase This first step will accurately represent the tables and constraints needed tosatisfy the conceptual data model description and so satisfies the completeness andintegrity requirements but it may be inflexible or offer poor usability The first designis then flexed to improve the quality of the database design Flexing is a term that isintended to capture the simultaneous ideas of bending something for a differentpurpose and weakening aspects of it as it is bent
The figure below summarizes the iterative (repeated) steps involved in databasedesign based on the overview given Its main purpose is to distinguish the generalissue of what tables should be used from the detailed definition of the constituentparts of each table ndash these tables are considered one at a time although they are notindependent of each other Each iteration that involves a revision of the tables wouldlead to a new design collectively they are usually referred to as second-cut designseven if the process iterates for more than a single loop
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
Fig 133 Source httpwwwoercommonsorgcoursesthe-database-development-life-cycle view
First for a given conceptual data model it is not necessary that all the userrequirements it represents have to be satisfied by a single database There can bevarious reasons for the development of more than one database such as the need forindependent operation in different locations or departmental control over lsquotheirrsquo dataHowever if the collection of databases contains duplicated data and users need toaccess data in more than one database then there are possible reasons that the onedatabase can satisfy multiple requirements or issues related to data replication anddistribution need to be examined
Chapter 13 64
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Second one of the assumptions about database development is that we can separatethe development of a database from the development of user processes that makeuse of it This is based on the expectation that once a database has beenimplemented all data required by currently identified user processes have beendefined and can be accessed but we also require flexibility to allow us to meet futurerequirements changes In developing a database for some applications it may bepossible to predict the common requests that will be presented to the database andso we can optimize our design for the most common requests
Third at a detailed level many aspects of database design and implementationdepend on the particular DBMS being used If the choice of DBMS is fixed or madeprior to the design task that choice can be used to determine design criteria ratherthan waiting until implementation That is it is possible to incorporate designdecisions for a specific DBMS rather than produce a generic design and then tailor itto the DBMS during implementation
It is not uncommon to find that a single design cannot simultaneously satisfy all theproperties of a good database So it is important that the designer has prioritizedthese properties (usually using information from the requirements specification) forexample to decide if integrity is more important than efficiency and whether usabilityis more important than flexibility in a given development
At the end of our design stage the logical schema will be specified by SQL datadefinition language (DDL) statements which describe the database that needs to beimplemented to meet the user requirements
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
136 ImplementationAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
Implementation involves the construction of a database according to the specificationof a logical schema This will include the specification of an appropriate storageschema security enforcement external schema and so on Implementation is heavilyinfluenced by the choice of available DBMS database tools and operatingenvironment There are additional tasks beyond simply creating a database schemaand implementing the constraints ndash data must be entered into the tables issuesrelating to the users and user processes need to be addressed and the managementactivities associated with wider aspects of corporate data management need to besupported In keeping with the DBMS approach we want as many of these concerns aspossible to be addressed within the DBMS We look at some of these concerns brieflynow
In practice implementation of the logical schema in a given DBMS requires a verydetailed knowledge of the specific features and facilities that the DBMS has to offer Inan ideal world and in keeping with good software engineering practice the first stageof implementation would involve matching the design requirements with the bestavailable implementing tools and then using those tools for the implementation In
65
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
database terms this might involve choosing vendor products whorsquos DBMS and SQLvariants are most suited to the database we need to implement However we donrsquotlive in an ideal world and more often than not hardware choice and decisionsregarding the DBMS will have been made well in advance of consideration of thedatabase design Consequently implementation can involve additional flexing of thedesign to overcome any software or hardware limitations
137 Realising the designAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After the Logical Design has been created we need our database to be createdaccording to the definitions we have produced For an implementation with arelational DBMS this will probably involve the use of SQL to create tables andconstraints that satisfy the logical schema description and the choice of appropriatestorage schema (if the DBMS permits that level of control)
One way to achieve this is to write the appropriate SQL DDL statements into a file thatcan be executed by a DBMS so that there is an independent record a text file of theSQL statements defining the database Another method is to work interactively using adatabase tool like SQL Server Management Studio or Microsoft Access Whatevermechanism is used to implement the logical schema the result is that a databasewith tables and constraints is defined but will contain no data for the user processes
138 Populating the databaseAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
After a database has been created there are two ways of populating the tables ndasheither from existing data or through the use of the user applications developed forthe database
For some tables there may be existing data from another database or data files Forexample in establishing a database for a hospital you would expect that there arealready some records of all the staff that have to be included in the database Datamight also be bought in from an outside agency (address lists are frequently broughtin from external companies) or produced during a large data entry task (convertinghard-copy manual records into computer files can be done by a data entry agency) Insuch situations the simplest approach to populate the database is to use the importand export facilities found in the DBMS
Facilities to import and export data in various standard formats are usually available(these functions are also known in some systems as loading and unloading data)Importing enables a file of data to be copied directly into a table When data are heldin a file format that is not appropriate for using the import function then it isnecessary to prepare an application program that reads in the old data transformsthem as necessary and then inserts them into the database using SQL codespecifically produced for that purpose The transfer of large quantities of existing data
Chapter 13 66
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
into a database is referred to as a bulk load Bulk loading of data may involve verylarge quantities of data being loaded one table at a time so you may find that thereare DBMS facilities to postpone constraint checking until the end of the bulk loading
httpwwwoercommonsorgcoursesthe-database-development-life-cycleview
139 Guidelines For Developing An ER DiagramAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
NOTE These are general guidelines which will assist in developing a strong basis forthe actual database design (the logical model)
bull Document all entities discovered during the information gathering stage
bull Document all attributes that belong to each entity Select candidate and primarykeys Ensure that all non-key attributes for each entity are full-functionallydependent on the primary key
bull Develop an initial E-R diagram and review with appropriate personnel(Remember that this is an iterative process)
bull Create new entities (tables) for multi-valued attributes and repeating groupsIncorporate these new entities (tables) in the E-R diagram Review withappropriate personnel
bull Verify E-R modeling by normalizing tables
67
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Chapter 14 Database Users
141 End usersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
These are people whose jobs require access to a database for querying updating andgenerating reports
An end user might be one of the following
The application user uses the existing application programs to perform their dailytasks
Sophisticated users are those who have their own way of accessing the database Thismean they do not use the application program provided in the system Instead theymight define their own application or describe their need directly using querylanguages Specialized users maintain their personal database by using ready ndashmadeprogram packages that provide easy-to-use menu driven commands ndash such as MSAccess
142 Application ProgrammersAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
People implement specific application programs to access to the stored data This kindof user needs to be familiar with the DBMSs to accomplish their task
143 Database AdministratorsAvailable under Creative Commons-ShareAlike 40 International License (http
creativecommonsorglicensesby-sa40)
A person or a group of people in the organization who is responsible for authorizingthe access to the database monitoring its use and managing all the resource tosupport the use of the whole database system
Chapter 14 68
Recommended