Upload
meljun-cortes-mbampa
View
227
Download
0
Embed Size (px)
Citation preview
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
1/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 1
PRELIM PERIODLecture no. 1: DATABASE SYSTEMS1.1 Introduction to Database SystemsDatabase (DB) - An integrated collection of related data
By related data we mean that the data represents logically coherent facts about some aspects of the real world
that are required by an application U niverse of discourse or mini-world - The part of the real world that a database is designed to model within
a computerBy integrated we mean that the data for multiple applications is stored together and manipulated in a uniform
way on a secondary storage such as a magnetic or an optical disk. The primary goal of integration is to supportinformation sharing across multiple applications.
a Database System consists of 1) an application specific database, 2) the DBMS that maintains that database,and 3) the application software that manipulates the database
Database Systems and Database Management Systems A Database Management System (DBMS ) is a collection of programs that controls a database. Specifically,
it provides us with an interface to create, maintain, and manipulate multiple databasesDBMS is a general-purpose software system that we can use not only to create and maintain multiple
databases but also to implement database systems for different applications as well. As opposed to a DBMS, which is general-purpose, a database system is developed to support the operations
of a specific organization or a specific set of applications.THE DATABASE APPROACHES(Ways of Handling Databases)
1. Manual – manual manipulation of dataEx. Manual card catalog
2. Computerized – electronic data handlingTraditional File Processing System (TFPS)Database Management System (DBMS)
DBMS vs TFPSTFPS - application programs directly (filenames and data definitions are embedded in each program.)
-data are integrated in a single, shared data file, all application programs that share the data file mustbe aware of all the data in the file, including those data items that they do not make use of or need toknow- The problem gets worse when a new field is added to a data file
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
2/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 2
Disadvantages of TFPS
1. Uncontrolled Redundancy2. Inconsistent Data3. Inflexibility4. Limited Data Sharing5. Poor Enforcement of Standards6. Low Programmer Productivity7. Excessive Program Maintenance
DBMS - stores the structure of the data as part of the description of the database in the system catalog,separately from the application programs
Characteristics of DBMS
Data Abstraction
DBMSs allow data to be structured in ways that make it more understandable and meaningful to theapplications than the ways data are physically stored on disks. They provide users with high-level, conceptualreC:\Documents and Settings\Arnel & Maegen\My Documents\CSCI12_lecedited.doc presentations of the data—atable in relational DBMSs, or an object in object-oriented DBMSs, to give two examples—while they hide storagedetails that are not of interest to most database users.
program-data independence - the physical organization of data can be changed without affecting theapplication programs
program-operation independence- the implementation of abstract operations can be changed without affectingthe code of the application programs, - as long as their calling interface stays the same
Data abstraction and, in particular, data independence is what facilitates data sharing and integration. Thisis the main advantage of DBMS against Traditional File processing whose application programs depend on the low-level structure of the data or storage organization, each program stores its data in a separate data file
Reliability
DBMSs provide high reliability by 1) enforcing integrity constraints and 2) ensuring data consistency despite hardwareor software failures.
Integrity constraints reflect the meaning (or, the semantics) of the data and of the application ( ex. Datatype)
Constraints – conditions, restrictionsData consistency that is, interrupted update operations do not corrupt the database with values that violate
the integrity constraints and no data in the database is lost.
After a failure, a DBMS automatically recovers, restoring the database to the consistent state in whichit existed just prior to the interruption. This consistent state is constructed as follows. During recovery, a DBMSrolls back all interrupted transactions, obliterating their updates from the database, and re-executessuccessfully terminated transactions as necessary, restoring their updates in the database
Efficiency
DBMSs support both efficient space utilization and efficient access to data. By making use of the data description inthe catalog, DBMSs are able to minimize data redundancy, which in turn saves both space, by storing each data itemonly once, and processing time, by eliminating the need of multiple updates to keep the replicas consistent and up-to-date.DBMSs enhance the performance of queries by means of optimizations and the use of access methods to data basedon their values. Optimizations simplify the queries so that they can execute faster,
and access methods allow direct access to locations where relevant data are stored, in a way similar to the accessprovided by the index in the back of a book.DBMSs decrease response time of transactions by allowing multiple users to access the database
concurrently
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
3/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 3
1.2 Relational DatabasesRelational Database Schema
A relational database schema is a set of table schemas and a set of integrity constraints. Integrity constraints canbe sorted into two kinds:
structural (model-specific) integrity constraints that are imposed by the model as discussed below, and
semantic (application-specific) integrity constraints imposed by the application, such as the constraint, forexample, that the balance of a savings account cannot be negative.
Keys - Keys are columns whose values are sufficient to uniquely identify a row TYPES OF KEYS1. Primary Key – uniquely identifies a record2. Secondary Key – used to access a group of records with common attributes3. Alternate Key – candidate to be Primary Key4. Composite Key – composed of two or more columns to access a unique record5. Foreign Key - a non- key attribute (ordinary column ) in one table, but a primary key in another.
- establishes association(relationships) among tables within one database (in a relationaldatabase schema)
DDL (Data Definition Language)
The command to create a table in SQL is the CREATE TABLE command. SQL supports all the basicdata types found in most programming languages: integer, float, character, and character string. SQLcommands are not case sensitive.
CREATE TABLE MEMBER(
MemNo integer(4),
DriverLic integer,Fname char(10),MI char,Lname char(15),PhoneNumber char(14),PRIMARY KEY (MemNo),UNIQUE (DriverLic)
);
The primary key is specified using the PRIMARY KEY directive, alternate keys using the UNIQUE directiveDML (Data Manipulation Lanaguage)
Update Operations
Relational DML allows us to insert and delete rows in a table as well as to update the values of one or morecolumns in a row.In SQL, only one row can be inserted at a time, by specifying the values of each column, as in the following
example:INSERT INTO MEMBERVALUES (101, 6876588, 'Susan', W, 'Jones', '412-376-8888');
This statement inserts a new row for Susan W. Jones in the MEMBER table. In SQL, strings are enclosedwithin single quotes.Delete and update can be applied to multiple rows that satisfy a selection condition. In SQL, a selection condition in adeletion is specified by a WHERE clause. In the simplest case, a row is selected by specifying the value of its primarykey. For example, the statementDELETE FROM MEMBERWHERE MemNo = 102;deletes the row with member number 102 from the MEMBER table. The following statement changes the middle initialof the member 101 in the MEMBER table.UPDATE MemberSET MI = SWHERE MemNo = 101;
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
4/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 4
An update operation succeeds if it does not violate any integrity constraints. For example, an insert operationwill not succeed if it attempts to insert a row whose keys, primary and alternate, conflict with existing keys. That is, ifthe row were to be inserted, the property that keys should be unique would be violated. On the other hand, deleting arow never violates a key constraint, unless the deleted row is referenced by a foreign key. In that case, deleting a rowmight violate a referential integrity constraint
TOOLS FOR QUERIES1) QBE (Query By Example)Query By Example (QBE) is another visual query language developed by IBM [Zloof, 1977] to simplify an
average end-user's task of retrieving data from a database. QBE saves the user from having to remember the namesof tables and columns, and the precise syntax of a query language. The basic idea is to retrieve data with the help ofquery templates.
QBE works as follows:
the system provides the user with a skeleton or query template of the tables in the database, and
the user fills in the tables with examples of what is to be retrieved or updated.
A skeleton of a table is basically a copy of the table without any rows, i.e. an empty table. For simple selectionconditions, the examples can be constant values, such as Susan and 100, or comparisons with constant values suchas 100, specified under a column
Projection in QBEProjection is specified by selecting the show button associated with each field, which we denote in our
example with "P.". To print all columns of retrieved tuples, we only need to put one "P." under the name of the table.EX. displays MemNo, Lname, and PhoneNumber from MEMBER:
QBE1:MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|
P. P. P.
The result of a query is displayed in a result table, which subsequently can be either stored or manipulatedfurther. In Microsoft Access, the resulting table is called a datasheet .Selection in QBEQBE2: Retrieve all members whose first name is John.
MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|P. John
By placing P. under the table name, this will retrieve and display the data in all the columns.
QBE3: Retrieve the name and member number of all the members whose member number is greater than100.
MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|>100 P. P.
Comparison with constant value (in the above example the constant value is 100) is placed in the appropriatecolumn. The resulting table will have the following columns:
Result| MemNo | Fname | Lname |In QBE, a disjunction (OR) is expressed by using different examples in different rows of the skeleton.
QBE4: Retrieve the name and member number of all the members whose first name is John or Susan.
MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|P. P.John P.P. P.Susan P.
A conjunction (AND), on the other hand, is expressed in the same row.
QBE5: Retrieve the name and member number of all the members whose first name is Susan and whose member
number is greater than 100.
MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|P.>100 P.Susan P.
If the conjunction is a condition involving a single column, the condition can be specified using the ANDoperator, as in SQL. For example, if the MemNo should be greater than 100 and less than 150, this is specified underthe MemNo column as: ( _x > 100 ) AND ( _x < 150 )
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
5/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 5
Join in QBEJoins can be expressed by using common example variables in multiple tables in the columns to be joined.
QBE6: List the member number and last name of all the members who currently have a borrowed book.
MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|P._join P.
BOOK|Book_id|CallNumber|Edition|BorrowerMemNo|BorrowDueDate| _join
To express multiple joins you can use multiple example variables at the same time.
SEATWORK:1.What is a Relational Database?2.Enumerate the different types of Keys and give an exampleQUIZ # 1
Lecture no. 2: Complete SQL 2.1 SQLStructured Query LanguageSQL – is the de-facto standard query language for relational DBMS.
- is a comprehensive language providing statements for both data definition and data manipulation.SQL DDL – (Data Definition Language)
- Provides basic commands for defining the conceptual schema of a database.SQL Provides 3 Numeric data types:
1.) Exact Number – These are integers or whole numbers which maybe positive or negative or zero.SQL Support 2 integer types:
1.) Integer (INT)2.) SMALLINT
2.) Approximate number – these are numbers that cannot be represented exactly, such as real numbers andfractional types.
3.) Formatted Number – theses are numbers stored in decimal notation.Formatted numbers can be defined using the ff:
1.) Decimal (ij)2.) DEC (ij)3.) Numeric (ij)Where: I = is the precision on the total number of digits excluding decimal point.
J = is the scale, on the number of fractional digits.Default scale is zero (0)
Syntax in creating a database name in SQL Query analyzer
1.) CREATE DATABASE USE
2.) CREATE ON(NAME = DATA FILE NAME FILENAME = “”)
3.) CREATE TABLE ( () PRIMARY KEY, () );
4.) INSERT [INTO] [(column_list)]VALUES (data_values)
5.) SELECT * FROM
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
6/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 6
SPECIFIC RELATIONAL OPERATIONS1.) Projection Operation (Π)
- Selects the attributes or an attribute list from a table r, while discarding the vest.2.) Selection Operation (б)
- Selects some rows in attribute r that satisfy a selection condition (alias predicate).
3.) Join Operation- Combines two tables in one, there by allowing us to obtain more information.
SEATWORK:1.) What is SQL and it’s capabilities? 2.) Create a Database, Insert values and view the data inserted.QUIZ # 2PRELIM EXAMINATION
MIDTERM PERIOD
Lecture no. 1: DATABASE DESIGN 1.1 Database System designCOST OF DATABASE APPROACH
If you are to implement DBMS in an organization you need to consider these 4 things :
1. New personalized personnel- organization should have or train individuals to :
a. maintain the new database software b. develop and enforce new programming standardsc. design databasesd. manage the staff of new people to train the new employees
- this personnel will increase or may increase productivity (should not
minimize skills )
2. Need to explicit back – up
- provide back – up copies of data because :
a. it is helpful in restoring damaged data files
b. provides validity checks on crucial data
3. Interference with shared data
- concurrent access to shared data via several application program problems
a. when 2 concurrent users both want to change the same or related data inaccurate results
can occur if access to data is not properly synchronized.
b. When data are used exclusively for updating, different users can obtain control of different
segments of the database to lock – up any use of the data.
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
7/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 7
Organizational Conflict
- a shared database requires a consensus of data definition
a. conflicts on how to define data length and coding rights to update shared data and
associated issues.
TYPES OF DATABASE
1. Operational Database
- contains business transaction and history of daily business activities
- used to support the on – going daily activities of the organization
- use on the “ Transaction Processing System “ Ex. Customers orders, purchases,
accounting, shipments and payments
2. Managerial Database
- used by middle managers for planning control, summaries of operational database
- summary of operation
- use on “ Management Information System “
3. Strategic Database
- used by senior managers to develop corporate strategies and seek competitive
advantage
- contains information on competitors to economic factors as well as corporate
information
- used on “ Decision Support System “
GENERIC TYPES OF DATABASE APPLICATION
1. Data Capture
- captures transaction data, populate databases and maintain the currency of data, gather
data
2. Data Transfer
- moves / transfers data from one database to another
- Ex. From operational to managerial
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
8/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 8
3. Data Distribution
- application resulting from data analysis
- converts data into a readily useful information and present them to the management
in a readily understandable form
- Ex. Report, summary and graphs
COMPONENTS OF DATABASE ENVIRONMENT
1. CASE Tools
- Computer – Aided Software Engineering ( CASE ) tools
- Automated tools used to design databases and application program
2. Repository
- centralized knowledge base containing all data definitions, screen and report formats
and definitions of other organizations and system components containing definitions of
data format
3. DBMS - commercial software system used to provide access to the database and repository
4. Database
- an integrated collection of data, organized to meet the information needs of multiple
users in an organization
- contains occurrences of data ( value itself
5. Application Programs
- computer programs are used to create and maintain the database and provide
information to users
DATA ADMINISTRATORS SYSTEM DEVELOPERS END USERS
Applicationuser interface CASE tools
Repository DBMS
Database
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
9/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 9
6. User Interface
- languages, menu and other facilities interacted by the users front and support
- use of menu driven system, mouse and voice recognition system to promote end –
user computing – user who are not experts, can define their own report, displays and
application
7. Data Administrators
- persons who are responsible for designing databases and for developing policies
regarding databases security and integrity
- they use CASE tools to improve the productivity of databases planning and design.
8. System Developers
- persons such as system analysts and programmers who design new application
programs.
- They use CASE tools for system requirement, analysis and program design.
9. End Users
- persons through the organization who adds, edits, delete and receive information
- encoders
Lecture no. 2: ENTITY – RELATIONSHIP MODELS
2.1 Entity – Relationship models
ENTITY – RELATIONSHIP MODELS
Relationship between two or more entities.
CATEGORIES OF ASSOCIATION
1. ASSOCIATION BETWEEN DATA ITEMS
Represent the relationship of data item or shows how each data item is related toanother. Each type of data item is represented by an ellipse or bubble with the data item enclosed.
Association between data items is represented by an arrow connecting the data item bubbles.
Example of data items that has - no meaningful association.
STUD # EMPLOY #
Example of data items that has - meaningful association
STUD # STUDNME
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
10/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 10
Types of Association
1. One - association - means that at any point in time, a given value of A has
one and only one value of A, then the value of B is implicitly known. Implicitly known means that it can be understood though not plainly expressed. We represent a one – association with a single – headed arrow.
A BEx.
EMPLOYEE ADDRESS
2. Many – association - means that at any point, a given value of A has one or many values of Bassociated with it. We represent a many – association with a double – headed arrow.
Ex. A B
STUD # SUBJECTS
MULTIVALUED ATTRIBUTE - occurs potentially multiple times for each item of A
3. Conditional Association - with this, for a given value of data item A there are two possibilities: either there is no value of data item B or there is one ( or many ) value (s) of data
item B. A conditional association is represented by a zero recorded on the arrow near theconditional item.
A B
Ex. Conditional item
BED PATIENT
CARDINALITY - term used by the analysts that is represented by the arrow heads and zeros on the
arrows which can be thought of as having minimum and maximum values.
Reverse Association
If there is an association from data item A to data item B, there is also a reverseassociation from B to A.
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
11/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 11
Types of Reverse Association
1. One – to – one associationMeans that at any point in time, each value of data item A is associated with zero or
exactly one of data item B. Conversely, each value of B is associated with one value of A.
A BEx.
STUD # STUDNME
2. One – to – many association
Means that at any point in time, each value of data item A is associated with zero, one ormany values of data item B. However, each value of B is associated with exactly one value of A.
The mapping from B to A is said to be many - to – one, since there may be many values of B
associated with one value of A.
A B
Ex.
STUD # EXAM
3. Many - to – many association
Means that at any point in time, each value of data item A is associated with zero, orone or many values of data item B. Also each value of B is associated with zero, or one or manyvalues of A.
A BEx.
STUD # COURSE
II. ASSOCIATION BETWEEN RECORDS
Shows the relationship between records.
Crow’s Foot - used to distinguish one and many associations between entities and records.
Crow’s Foot Notation - used to represent the association between records.
Types of Association
1. One Association - no crow’s foot ( one - to – one )
HUSBAND STUDENT
Ex.
WIFE GRADE
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
12/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 12
2. Many Association - represented by a crow’s foot
EMPLOYEE STUDENT
Ex.
BENEFICIARY COURSE
DATA MODELS
Representation of the data about entities, events, activities and their associations within the organizations.
CATEGORIES / GROUPS OF DATA MODELS
I. SEMANTIC DATA MODEL
Use of capture all meaning of data and to embed this as integrity and structural clauses in the database
definitions. Such concepts as class, subclass, aggregation, dynamic properties and structures and handling
object of different types ( images, voice print, as well as text and data ) are included in the SDM and other
semantically rich data models.
II. RELATIONAL DATA MODEL
The relational data model uses the concept of a relation to represent what we have previously called a file
that is a relation represents an entity class. A relation is viewed as a two dimensional table.
The choice of many database builders and users is the relational data model. It is different from other
models not only from the architecture but also in the following ways :
1. Implementation Independence - it logically represents all relationships implicitly and hence, one does
not know what associations are or not physically represented by an efficient method. Relational shares this
property with ER – D.
2. Terminology - it uses its own terminology, most of which has equivalent terms in other data models.
3. Logical Key Pointers - it uses primary and secondary keys in records to represent the association
between 2 records, whereas ER – D uses arc between entity boxes.
4. Normalization Theory - properties of database that make it free of certain maintenance problems have
been developed within the context of the relational data model ( although this properties can also be designed
into an ER – D or a network data model ).
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
13/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 13
5. High Level Programming Languages - P. L. have been developed specifically to access database
defined via the relational data model; these languages permit data to be manipulated as groups of files than
procedurally one record at a time.
III. HIERARCHICAL DATA MODEL
Organizations are usually viewed as a hierarchy oppositions and authority. Computer programs can be
viewed as hierarchy of control and operating modules; and various taxonomies of animals and plants view
elements in a hierarchical sets of relationship. The hierarchical data model represents data as a set of nested
one to many relationships, the hierarchical data model is used exclusively with hierarchical database
management systems; since such systems are in general, being phased out.
IV. NETWORK DATA MODEL
The network data model permits as much or as little structure as is desired. We can even create a hierarchy
( a special of a network ) if that is what is needed. As the hierarchical data model, if a certain relationship
is not explicitly included in the database definition, then it cannot be used by a DBMS in processing a
database.
V. ENTITY RELATIONSHIP DATA MODEL ( ER –
DIAGRAM )
It is based on the perception of a real world that consists of a set of basic objects called entities and
relationships among entities / objects. It is a graphical notation that uses special symbols to indicate
relationship among entities intended primarily for the database design process.
Basic Symbols
Entity
Relationship
Data Item
Stands for “is a”
Primary Key
Class - sub - class
Degree
The number of entities that participate in a relationship.
ISA
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
14/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 14
Most Typical Degrees for Relationship
1. Unary Relationship - relationship between instances of the
entity class.
Ex.
2. Binary Relationship - relationship between instances oftwo entity classes.
Ex.
3. Ternary Relationship - relationship among instances of threeentity classes.
Ex.
EMPLOYEE
PERSON
PARENT CHILD
CUSTOMER ORDER
PRODUCT VENDOR
WAREHOUSE
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
15/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 15
SEATWORK:
1.) Give two examples using Unary, Binary and Ternary Relationships
QUIZ # 3
Normalizing a Database
Normalization
- is a process of reducing redundancies of data in a database.
- is a technique that is used when designing and redesigning a database.
- is a process or set of guidelines used to optimally design a database to reduce redundant data.
The Raw Database
A database that is not normalized may include data that is contained in one or more different tables for no apparent
reason. This could be bad for security reasons, disk space usage, speed of queries, efficiency of database updates, and,
maybe most importantly, data integrity. A database before normalization is one that has not been broken down
logically into smaller, more manageable tables.
COMPANY_DATABASE
Emp_id cust_idLast_name cust_nameFirst_name cust_addressMiddle_name cust_cityAddress cust_state
City cust_zipState cust_phoneZip cust_fax
Phone ord_numPager qty
Position ord_dateDate_hire prod_id
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
16/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 16
Logical Database Design
Any database should be designed with the end user in mind. Logical database design, also referred to as the logical
model, is the process of arranging data into logical, organized groups of objects that can easily be maintained. The
logical design of a database should reduce data repetition or go so far as to completely eliminate it. After all, why
store the same data twice? Naming conventions used in a database should also be standard and logical.
What are the End User’s Needs?
The needs of the end user should be one of the top considerations when designing a database. Remember that the end
user is the person who ultimately uses the database. There should be ease of use through the user’s front-end tool (a
client program that allows a user access to a database), but this, along with optimal performance, cannot be achieved if
the user’s needs are not taken into consideration.
Some user-related design considerations include the following:
What data should be stored in the database?
How will the user access the database?
What privileges does the user require?
How should the data be grouped in the database?
What data is the most commonly accessed?
How is all data related in the database?
What measures should be taken to ensure accurate data?
Data redundancy
Data should not be redundant, which means that the duplication of data should be kept to a minimum for several
reasons. For example, it is unnecessary to store an employee’s home address in more than one table. With duplicate
data, unnecessary space is used. Confusion is always a threat when, for instance, an address for an employee in one
table does not match the address of the same employee in another table.
Which table is correct? Do you have documentation to verify the employee’s current address? As if data management
were not difficult enough, redundancy of data could prove to be a disaster.
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
17/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 17
The Normal Forms
Normal form is a way of measuring the levels or depth, to which a database has been normalized. A
database’s level of normalization is determined by the normal form.
The following are the three most common normal forms in the normalization process :
The first normal form
The second normal form
The third normal form
Of the three normal forms, each subsequent normal form depends on normalization steps taken in the
previous normal form. For example, to normalize a database using the second normal form, the database
must first be in the first normal form.
The First Normal Form
The objective of the first normal form is to divide the base data into logical units called tables. When each
table has been designed, a primary key is assigned to most or all tables.
COMPANY _ DATABASEEMPLOYEE _ TBL CUSTOMER _ TBLemp_id emp_id cust_id cust_id
last_name last_name cust_name cust_namefirst_name first_name cust_address cust_addressmiddle_name middle_name cust_city cust_cityaddress address cust_state cust_state
city city cust_zip cust_zipstate state cust_phone cust_phonezip zip cust_fax cust_fax
phone phone ord_num ord_num pager pager qty qty
position position ord_date ord_date position_desc postion_descdate_hire date_hire prod_id
pay_rate pay_rate prod_desc PRODUCTS_TBL bonus bonus cost prod_iddate_last_raise date_last_raise prod_desc
cost
You can see that to achieve the first normal form, data had to be broken into logical units of related
information, each having a primary key and ensuring that there are no repeated groups in any of the tables.
Instead of the large table, there are now smaller, more manageable tables: EMPLOYEE_TBL,
CUSTOMER_TBL and PRODUCTS_TBL. The primary keys are normally the first columns listed in a table,
in this case: EMP_ID and PROD_ID.
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
18/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 18
The Second Normal Form
The objective of the second normal form is to take data that is only partly dependent on the primary key
and enter that data into another table.
EMPLOYEE_TBL EMPLOYEE_TBLemp_id emp_id
last_name last_namefirst_name first_namemiddle_name middle_name
address addresscity city
state state EMPLOYEE_PAY_TBL
zip zip emp_id phone phone position pager pager position_desc position
position_desc date_hiredate_hire pay_rate
pay_rate bonus bonus date_last_raisedate_last_raise
CUSTOMER_TBL
CUSTOMER_TBL cust_id
cust_id cust_namecust_name cust_addresscust_address cust_citycust_city cust_state
cust_state cust_zipcust_zip cust_phonecust_phone cust_faxcust_fax
ORDERS_TBL
ord_num prod_id ord_numqty prod_id
ord_date qtyord_date
FIRST NORMAL FORM SECOND NORMAL FORM
According to the figure, the second normal form is derived from the first normal form by further breaking
two tables down into more specific units.
EMPLOYEE_TBL split into two tables called EMPLOYEE-TBL and EMPLOYEE_PAY_TBL. Personal
employee information is dependent on the primary key
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
19/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 19
( EMP_ID ), so that the information remained in the EMPLOYEE_TBL ( EMP_ID, LAST_NAME,
FIRST_NAME, MIDDLE_NAME, ADDRESS, CITY, STATE, ZIP, PHONE and PAGER. On the other
hand, the information that is only partly dependent on the EMP_ID ( each individual employee ) is used to
populate EMPLOYEE_PAY_TBL ( EMP_ID, POSITION, POSITION_DESC, DATE_HIRE, PAY_RATE
and DATE_LAST_RAISE ). Notice that both tables contain the column EMP_ID. This is the primary key
of each table and is used to match corresponding data between the two tables.
CUSTOMER_TBL split into two tables called CUSTOMER_TBL and ORDERS_TBL. What took place is
similar to what occurred in the EMPLOYEE_TBL. Columns that were partly dependent on the primary key
were directed to another table. The order information for a customer is dependent on each CUST_ID, but
does not directly depend on the general customer information in the original table.
The Third Normal Form
The third normal form’s objective is to remove data in a table that is not dependent on the primary key.
Another table was created to display the use of the third normal form. EMPLOYEE_PAY_TBL is split into
two tables, one table containing the actual employee pay information and the other containing the position
descriptions, which really do not
need to reside in EMPLOYEE-PAY_TBL. The POSITION_DESC column is totally independent of the
primary key, EMP_ID.
EMPLOYEE_PAY_TBL
emp_id
position position_desc
date_hire pay_rate bonus
date_last_raise
EMPLOYEE_PAY_TBL
emp_id POSITIONS_TBL
position positiondate_hire position-desc
pay_rate bonusdate_last_raise
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
20/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 20
Benefits of Normalization
Normalization provides numerous benefits to a database. Some of the major benefits include the following :
- Greater overall database organization
- Reduction of redundant data
- Data consistency within the database design
- A much more flexible database design
- A better handle on database security
.
Drawbacks of Normalization
Although most successful databases are normalized to some degree, there is one substantial drawback of a
normalized database: reduced database performance. The acceptance of reduced performance requires the
knowledge that when a query or transaction request is sent to the database, there are factors involved, such
as CPU usage, memory usage and input/output (I/O). To make a long story short, a normalized database
requires much more CPU, memory and I/O to process transactions and database queries than does a de-
normalized database. A normalized database must locate the requested tables and then join the data from the
tables to either get the requested information or to process the desired data. A more in-depth discussion
concerning database performance occurs in Hour 18, “Managing Database Users. “
QUIZ # 4
MIDTERM EXAMINATION
FINALS PERIOD
Lecture no. 1: TRANSACTION MANAGEMENT
1.1 Transaction Support
Transaction – an action or series of actions, carried out by a single user or application
program, which accesses or changes the contents of the database.
- is a logical unit of work on the database. It may be an entire program, partof a program, or a single command.
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
21/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 21
Properties of Transactions
There are properties that all transactions should possess the four basic, or so called ACID:
Atomicity – the “all or nothing” property. A transaction is an indivisible unit that is either performedin its entirely or it is not performed at all.
Consistency- a transaction must transform the database from one consistent state to anotherconsistent state.
Isolation – transactions execute independently of one another. In other words, the partial effects ofincomplete transactions should not be visible to other transactions.
Durability – the effects of a successfully completed (committed) transaction are permanentlyrecorded in the database and must be lost because of a subsequent failure.
THE DBMS TRANSACTION MANAGEMENT
Transaction Manager – coordinates transactions on behalf of application programs.
Scheduler – the module responsible for implementing a particular strategy for concurrency control.
Sometimes referred to as “ Lock Manager”.
Recovery Manager - ensures that the database is restored to the state it was in before the start of the
transaction, and therefore a consistent file.
Buffer Manager – is responsible for the transfer of data between disk storage and main memory.
Database and System Catalog
Figure 1
The DBMS Transaction Subsystem
Transaction Manager Scheduler
Buffer Manager Recovery Manager
Access Manager
Systems Manager
File Manager
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
22/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 22
1.2 Concurrency Control
Concurrency Control – the process of managing simultaneous operations on the
database without having them interfere with one another.
Major objective in developing a database
“ Is to enable many users to access shared data concurrently”
Three examples of potential problems caused by concurrency:
1.) The Lost Update Problem – an apparently successfully completed updateoperations by one user can be overridden by another user.
2.) The Uncommitted Dependency Problem – occurs when one transaction is allowed to see the
intermediate results of another transaction before it has committed.
3.) The Inconsistent Analysis Problem – occurs when a transaction updates some of them using theexecution of the first.
1.3 Database Recovery
Database Recovery – the process of restoring the database to a correct state in the event of a failure.
Four different types of media with an increasing degree for reliability:
1.) Main Memory – is volatile storage that usually does not survive system crashes.
2.) Magnetic Disks – provide online non-volatile storage. Compared with main memory, disks are morereliable and much cheaper, but slower by three to four order of magnitude.
3.) Magnetic Tape – is an offline non-volatile storage medium, which is far more reliable than disk andfairly inexpensive, but slower, providing only sequential access.
4.) Optical Disks – is more reliable than tape, generally cheaper, faster, providing random access.
Additional Facts:
- Main memory is also called Primary Storage.- Disks and tape are known as Secondary Storage.- Stable storage represents information that has been replicated in several non-volatile storage
media (usually disk) with independent failure modes.
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
23/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 23
Among the causes of failure are:
1.) System crashes – due to hardware or software errors, resulting in loss of main memory.
2.) Media failures – such as head crashes or unreadable media, resulting in the loss of parts ofsecondary storage.
3.) Application software errors – such as logical errors in the program that is accessing the database,which cause one or more transaction to fail.
4.) Natural physical disasters – such as fire, floods, earthquakes, or power failures.
5.) Sabotage – or can be called as intentional corruption or destruction of data, hardware or softwarefacilities.
Two Principal Effects that we need to consider:
1.) The loss of main memory, including the database buffers.2.) The loss of the disk copy of the database.
A DBMS should provide the following facilities to assist with recovery:
A backup mechanism, which makes periodic backup copies of the database.
Logging facilities, which keep track of the current state of transactions and database changes.
A checkpoint facility, which enables updates to the database that are in progress to be made permanent.
A recovery manager, which allows the system to restore the database to a consistent state following a
failure.
Log File
To keep track of database transactions, the DBMS maintains a special file called a log
(or journal) that contains information about all updates to the database.
The log may contain the following data:
1.) Transaction Records, containing:
Transaction identifier. Type of log record.
Identifier of data item affected by the database action.
2.) Checkpoint records
Checkpoint – the point of synchronization between the database and the
transaction log file. All buffers are force-written to secondary storage.
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
24/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 24
Checkpoint are scheduled at predetermined intervals and involve the following
operations:
Writing all log records in main memory to secondary storage.
Writing the modified blocks in the database buffers to secondary storage. Writing a checkpoint record to the log file. This record contains the identifiers of all transactions that
are active at the time of the checkpoint.
Recovery Techniques
1.) Recovery techniques using deferred update
When a transaction starts, write a transaction start record to the log.
When any write operation is performed, write a log record containing all the data specified previously(excluding the before-image of the update). Do not actually write the update to the database buffers
or the database itself.
When a transaction is about to commit, write a transaction commit log record, write all the logrecords for the transaction to disk and then commit the transaction. Use the log records to perform the
actual updates to the database.
If a transaction aborts, ignore the log records for the transaction and do not perform the writes.
Any transaction with transaction start and transaction commit log records should be redone.
For any transactions with transaction start and transaction abort log records, we do nothing, since noactual writing was done to the database, so these transactions do not have to be undone.
2.) Recovery techniques using immediate update
When a transaction starts, write a transaction start record to the log.
When a write operation is performed, write a record containing the necessary data to the log file.
Once the log record is written, write the update to the database buffers.
The updates to the database itself are written when the buffers are next flushed to secondarystorage.
When a transaction commits, write a transaction commit to the log.
Lecture no. 2: IMPROVING QUERY PERFORMANCE
2.1 Hash Files
Hash function – calculates the address of the page in which the record is to be stored based on one or more of the fields in the record.
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
25/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 25
Hash field – is also called as base field.
Hash key – if the field is also a key field of the file.
Collision – when the same address is generated for two or more records.
There are several techniques that can be used to manage collisions:
1.) Open addressing2.) Unchained overflow3.) Chained overflow4.) Multiple hashing
2.2 Indexes
Indexes – a data structure that allows the DBMS to locate particular records in a file more quickly, and
thereby speed response to user queries.
Data file – the file containing the logical records.
Index file – the file containing the index records.
Primary index – if the data file is sequentially ordered, and the indexing field is a key field of
the file, it is guaranteed to have a unique value in each record.
Clustering index – if the indexing is not a key field of the file, so that there can be more thanone record corresponding to a value of the indexing field.
Secondary index – an index that is defined on a non-ordering field of the data file.
Indexed sequential file – a sorted data file with a primary index.
An Indexed sequential file is a more versatile structure, which normally has:
A primary storage area.
A separate index or indexes.
An overflow area.
SEATWORK:
1.) Define Transaction2.) Give other Database recovery techniques
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
26/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 26
QUIZ #5
Lecture no. 3: DATA WAREHOUSING, OLAP and DATA MINING
3.1 Data Warehousing
Data Warehousing - is a subject-oriented, integrated, time-variant, and non-volatilecollection of data in support of management’s decision-making process.
Benefits of Data Warehousing
Potential high returns on investment
Competitive advantage
Increased productivity of corporate decision-makers
Problems of Data Warehousing
Underestimation of resources for data loading
Hidden problems with source systems
Required data not captured
Increased end-user demands
Data homogenization
High demand for resources
Data ownership
High maintenance
Long duration projects Complexity of integration
The Major components of a Data Warehouse
1. Operational Data
The source of data for the data warehouse is supplied from:
Mainframe operational held in first generation hierarchical and network databases.
Departmental data held in propriety file systems such as VSAM, RMS, and relational DBMS
such as Informix, Oracle. Private data held on workstations and private servers.
External systems such as the Internet, commercially available databases, or databases associatedwith an organization’s supplier or customers.
2. Load Manager – (also called the front-end component) performs all the operations associated withthe extraction and loading of data into the warehouse.
3. Warehouse Manager – performs all the operations associated with the management of the data inthe warehouse.
4. Query Manager - (also called the back-end component) performs all the operations associated withthe management with the management of user queries.
5. End-user access tools – is to provide information to business users for strategic decision making.
8/9/2019 MELJUN CORTES DATABASE System Instructional Manual
27/27
DATABASE FUNDAMENTAL
2ND
Semester 2014-2015 MELJUN P. CORTES
Can be categorized into five main groups:
Reporting and query tools.
Application development tools.
Executive information system (EIS) tools.
Online analytical processing (OLAP) tools.
Data mining tools.
3.2 Online Analytical Processing (OLAP) – the dynamic synthesis, analysis, and consolidation of largevolumes of multi-dimensional data.
Rules for OLAP Systems
Multi-dimensional conceptual view
Transparency
Accessibility
Consistent reporting performance
Client-server architecture
Generic dimensionality
Dynamic sparse matrix handling
Multi-user support
Unrestricted cross-dimensional operations
Intuitive data manipulation
Flexible reporting
Unlimited dimensions and aggregations
Categories of OLAP Tools
1. Multi-dimensional OPLAP ( MOLAP or MD-OPLAP ) – use specialized data structures andMulti-dimensional Database Management (MDDBMSs) to organize, navigate, and analyze
data.
2. Relational OPLAP ( ROLAP ) – is the fastest-growing style of OLAP technology. ROLAPsupports RDBMS products through the use of a meta-data layer, thus avoiding the
requirement to create a static multi-dimensional data structure.
3. Managed Query Environment (MQE) – they provide limited analysis capability, eitherdirectly against RDBMS products, or by using an intermediate MOLAP server.
3.3 Data Mining
Data Mining – the process of extracting valid, previously unknown, comprehensible, and actionable
information from large databases using it to make crucial business decisions.
Four Main Operations Associated with Data Mining Techniques:
1. Predictive modeling 3. Link analysis2. Database segmentation 4. Deviation detection
QUIZ #6
FINAL EXAMINATION