MELJUN CORTES DATABASE System Instructional Manual

8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

1/27

DATABASE FUNDAMENTAL

2ND

Semester 2014-2015 MELJUN P. CORTES

MELJUN P. CORTES, mba,mpa,bscs,mscs in progress 1

PRELIM PERIODLecture no. 1: DATABASE SYSTEMS1.1 Introduction to Database SystemsDatabase (DB) - An integrated collection of related data

By related data we mean that the data represents logically coherent facts about some aspects of the real world

that are required by an application U niverse of discourse or mini-world - The part of the real world that a database is designed to model within

a computerBy integrated we mean that the data for multiple applications is stored together and manipulated in a uniform

way on a secondary storage such as a magnetic or an optical disk. The primary goal of integration is to supportinformation sharing across multiple applications.

a Database System consists of 1) an application specific database, 2) the DBMS that maintains that database,and 3) the application software that manipulates the database

Database Systems and Database Management Systems A Database Management System (DBMS ) is a collection of programs that controls a database. Specifically,

it provides us with an interface to create, maintain, and manipulate multiple databasesDBMS is a general-purpose software system that we can use not only to create and maintain multiple

databases but also to implement database systems for different applications as well. As opposed to a DBMS, which is general-purpose, a database system is developed to support the operations

of a specific organization or a specific set of applications.THE DATABASE APPROACHES(Ways of Handling Databases)

1. Manual – manual manipulation of dataEx. Manual card catalog

2. Computerized – electronic data handlingTraditional File Processing System (TFPS)Database Management System (DBMS)

DBMS vs TFPSTFPS - application programs directly (filenames and data definitions are embedded in each program.)

-data are integrated in a single, shared data file, all application programs that share the data file mustbe aware of all the data in the file, including those data items that they do not make use of or need toknow- The problem gets worse when a new field is added to a data file


2/27


2ND



Disadvantages of TFPS

1. Uncontrolled Redundancy2. Inconsistent Data3. Inflexibility4. Limited Data Sharing5. Poor Enforcement of Standards6. Low Programmer Productivity7. Excessive Program Maintenance

DBMS - stores the structure of the data as part of the description of the database in the system catalog,separately from the application programs

Characteristics of DBMS

Data Abstraction

DBMSs allow data to be structured in ways that make it more understandable and meaningful to theapplications than the ways data are physically stored on disks. They provide users with high-level, conceptualreC:\Documents and Settings\Arnel & Maegen\My Documents\CSCI12_lecedited.doc presentations of the data—atable in relational DBMSs, or an object in object-oriented DBMSs, to give two examples—while they hide storagedetails that are not of interest to most database users.

program-data independence - the physical organization of data can be changed without affecting theapplication programs

program-operation independence- the implementation of abstract operations can be changed without affectingthe code of the application programs, - as long as their calling interface stays the same

Data abstraction and, in particular, data independence is what facilitates data sharing and integration. Thisis the main advantage of DBMS against Traditional File processing whose application programs depend on the low-level structure of the data or storage organization, each program stores its data in a separate data file

Reliability

DBMSs provide high reliability by 1) enforcing integrity constraints and 2) ensuring data consistency despite hardwareor software failures.

Integrity constraints reflect the meaning (or, the semantics) of the data and of the application ( ex. Datatype)

Constraints – conditions, restrictionsData consistency that is, interrupted update operations do not corrupt the database with values that violate

the integrity constraints and no data in the database is lost.

After a failure, a DBMS automatically recovers, restoring the database to the consistent state in whichit existed just prior to the interruption. This consistent state is constructed as follows. During recovery, a DBMSrolls back all interrupted transactions, obliterating their updates from the database, and re-executessuccessfully terminated transactions as necessary, restoring their updates in the database

Efficiency

DBMSs support both efficient space utilization and efficient access to data. By making use of the data description inthe catalog, DBMSs are able to minimize data redundancy, which in turn saves both space, by storing each data itemonly once, and processing time, by eliminating the need of multiple updates to keep the replicas consistent and up-to-date.DBMSs enhance the performance of queries by means of optimizations and the use of access methods to data basedon their values. Optimizations simplify the queries so that they can execute faster,

and access methods allow direct access to locations where relevant data are stored, in a way similar to the accessprovided by the index in the back of a book.DBMSs decrease response time of transactions by allowing multiple users to access the database

concurrently


3/27


2ND



1.2 Relational DatabasesRelational Database Schema

A relational database schema is a set of table schemas and a set of integrity constraints. Integrity constraints canbe sorted into two kinds:

structural (model-specific) integrity constraints that are imposed by the model as discussed below, and

semantic (application-specific) integrity constraints imposed by the application, such as the constraint, forexample, that the balance of a savings account cannot be negative.

Keys - Keys are columns whose values are sufficient to uniquely identify a row TYPES OF KEYS1. Primary Key – uniquely identifies a record2. Secondary Key – used to access a group of records with common attributes3. Alternate Key – candidate to be Primary Key4. Composite Key – composed of two or more columns to access a unique record5. Foreign Key - a non- key attribute (ordinary column ) in one table, but a primary key in another.

- establishes association(relationships) among tables within one database (in a relationaldatabase schema)

DDL (Data Definition Language)

The command to create a table in SQL is the CREATE TABLE command. SQL supports all the basicdata types found in most programming languages: integer, float, character, and character string. SQLcommands are not case sensitive.

CREATE TABLE MEMBER(

MemNo integer(4),

DriverLic integer,Fname char(10),MI char,Lname char(15),PhoneNumber char(14),PRIMARY KEY (MemNo),UNIQUE (DriverLic)

);

The primary key is specified using the PRIMARY KEY directive, alternate keys using the UNIQUE directiveDML (Data Manipulation Lanaguage)

Update Operations

Relational DML allows us to insert and delete rows in a table as well as to update the values of one or morecolumns in a row.In SQL, only one row can be inserted at a time, by specifying the values of each column, as in the following

example:INSERT INTO MEMBERVALUES (101, 6876588, 'Susan', W, 'Jones', '412-376-8888');

This statement inserts a new row for Susan W. Jones in the MEMBER table. In SQL, strings are enclosedwithin single quotes.Delete and update can be applied to multiple rows that satisfy a selection condition. In SQL, a selection condition in adeletion is specified by a WHERE clause. In the simplest case, a row is selected by specifying the value of its primarykey. For example, the statementDELETE FROM MEMBERWHERE MemNo = 102;deletes the row with member number 102 from the MEMBER table. The following statement changes the middle initialof the member 101 in the MEMBER table.UPDATE MemberSET MI = SWHERE MemNo = 101;


4/27


2ND



An update operation succeeds if it does not violate any integrity constraints. For example, an insert operationwill not succeed if it attempts to insert a row whose keys, primary and alternate, conflict with existing keys. That is, ifthe row were to be inserted, the property that keys should be unique would be violated. On the other hand, deleting arow never violates a key constraint, unless the deleted row is referenced by a foreign key. In that case, deleting a rowmight violate a referential integrity constraint

TOOLS FOR QUERIES1) QBE (Query By Example)Query By Example (QBE) is another visual query language developed by IBM [Zloof, 1977] to simplify an

average end-user's task of retrieving data from a database. QBE saves the user from having to remember the namesof tables and columns, and the precise syntax of a query language. The basic idea is to retrieve data with the help ofquery templates.

QBE works as follows:

the system provides the user with a skeleton or query template of the tables in the database, and

the user fills in the tables with examples of what is to be retrieved or updated.

A skeleton of a table is basically a copy of the table without any rows, i.e. an empty table. For simple selectionconditions, the examples can be constant values, such as Susan and 100, or comparisons with constant values suchas 100, specified under a column

Projection in QBEProjection is specified by selecting the show button associated with each field, which we denote in our

example with "P.". To print all columns of retrieved tuples, we only need to put one "P." under the name of the table.EX. displays MemNo, Lname, and PhoneNumber from MEMBER:

QBE1:MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|

P. P. P.

The result of a query is displayed in a result table, which subsequently can be either stored or manipulatedfurther. In Microsoft Access, the resulting table is called a datasheet .Selection in QBEQBE2: Retrieve all members whose first name is John.

MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|P. John

By placing P. under the table name, this will retrieve and display the data in all the columns.

QBE3: Retrieve the name and member number of all the members whose member number is greater than100.

MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|>100 P. P.

Comparison with constant value (in the above example the constant value is 100) is placed in the appropriatecolumn. The resulting table will have the following columns:

Result| MemNo | Fname | Lname |In QBE, a disjunction (OR) is expressed by using different examples in different rows of the skeleton.

QBE4: Retrieve the name and member number of all the members whose first name is John or Susan.

MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|P. P.John P.P. P.Susan P.

A conjunction (AND), on the other hand, is expressed in the same row.

QBE5: Retrieve the name and member number of all the members whose first name is Susan and whose member

number is greater than 100.

MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|P.>100 P.Susan P.

If the conjunction is a condition involving a single column, the condition can be specified using the ANDoperator, as in SQL. For example, if the MemNo should be greater than 100 and less than 150, this is specified underthe MemNo column as: ( _x > 100 ) AND ( _x < 150 )


5/27


2ND



Join in QBEJoins can be expressed by using common example variables in multiple tables in the columns to be joined.

QBE6: List the member number and last name of all the members who currently have a borrowed book.

MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|P._join P.

BOOK|Book_id|CallNumber|Edition|BorrowerMemNo|BorrowDueDate| _join

To express multiple joins you can use multiple example variables at the same time.

SEATWORK:1.What is a Relational Database?2.Enumerate the different types of Keys and give an exampleQUIZ # 1

Lecture no. 2: Complete SQL 2.1 SQLStructured Query LanguageSQL – is the de-facto standard query language for relational DBMS.

- is a comprehensive language providing statements for both data definition and data manipulation.SQL DDL – (Data Definition Language)

- Provides basic commands for defining the conceptual schema of a database.SQL Provides 3 Numeric data types:

1.) Exact Number – These are integers or whole numbers which maybe positive or negative or zero.SQL Support 2 integer types:

1.) Integer (INT)2.) SMALLINT

2.) Approximate number – these are numbers that cannot be represented exactly, such as real numbers andfractional types.

3.) Formatted Number – theses are numbers stored in decimal notation.Formatted numbers can be defined using the ff:

1.) Decimal (ij)2.) DEC (ij)3.) Numeric (ij)Where: I = is the precision on the total number of digits excluding decimal point.

J = is the scale, on the number of fractional digits.Default scale is zero (0)

Syntax in creating a database name in SQL Query analyzer

1.) CREATE DATABASE USE

2.) CREATE ON(NAME = DATA FILE NAME FILENAME = “”)

3.) CREATE TABLE ( () PRIMARY KEY, () );

4.) INSERT [INTO] [(column_list)]VALUES (data_values)

5.) SELECT * FROM


6/27


2ND



SPECIFIC RELATIONAL OPERATIONS1.) Projection Operation (Π)

- Selects the attributes or an attribute list from a table r, while discarding the vest.2.) Selection Operation (б)

- Selects some rows in attribute r that satisfy a selection condition (alias predicate).

3.) Join Operation- Combines two tables in one, there by allowing us to obtain more information.

SEATWORK:1.) What is SQL and it’s capabilities? 2.) Create a Database, Insert values and view the data inserted.QUIZ # 2PRELIM EXAMINATION

MIDTERM PERIOD

Lecture no. 1: DATABASE DESIGN 1.1 Database System designCOST OF DATABASE APPROACH

If you are to implement DBMS in an organization you need to consider these 4 things :

1. New personalized personnel- organization should have or train individuals to :

a. maintain the new database software b. develop and enforce new programming standardsc. design databasesd. manage the staff of new people to train the new employees

- this personnel will increase or may increase productivity (should not

minimize skills )

2. Need to explicit back – up

- provide back – up copies of data because :

a. it is helpful in restoring damaged data files

b. provides validity checks on crucial data

3. Interference with shared data

- concurrent access to shared data via several application program problems

a. when 2 concurrent users both want to change the same or related data inaccurate results

can occur if access to data is not properly synchronized.

b. When data are used exclusively for updating, different users can obtain control of different

segments of the database to lock – up any use of the data.


7/27


2ND



Organizational Conflict

- a shared database requires a consensus of data definition

a. conflicts on how to define data length and coding rights to update shared data and

associated issues.

TYPES OF DATABASE

1. Operational Database

- contains business transaction and history of daily business activities

- used to support the on – going daily activities of the organization

- use on the “ Transaction Processing System “ Ex. Customers orders, purchases,

accounting, shipments and payments

2. Managerial Database

- used by middle managers for planning control, summaries of operational database

- summary of operation

- use on “ Management Information System “

3. Strategic Database

- used by senior managers to develop corporate strategies and seek competitive

advantage

- contains information on competitors to economic factors as well as corporate

information

- used on “ Decision Support System “

GENERIC TYPES OF DATABASE APPLICATION

1. Data Capture

- captures transaction data, populate databases and maintain the currency of data, gather

data

2. Data Transfer

- moves / transfers data from one database to another

- Ex. From operational to managerial


8/27


2ND



3. Data Distribution

- application resulting from data analysis

- converts data into a readily useful information and present them to the management

in a readily understandable form

- Ex. Report, summary and graphs

COMPONENTS OF DATABASE ENVIRONMENT

1. CASE Tools

- Computer – Aided Software Engineering ( CASE ) tools

- Automated tools used to design databases and application program

2. Repository

- centralized knowledge base containing all data definitions, screen and report formats

and definitions of other organizations and system components containing definitions of

data format

3. DBMS - commercial software system used to provide access to the database and repository

4. Database

- an integrated collection of data, organized to meet the information needs of multiple

users in an organization

- contains occurrences of data ( value itself

5. Application Programs

- computer programs are used to create and maintain the database and provide

information to users

DATA ADMINISTRATORS SYSTEM DEVELOPERS END USERS

Applicationuser interface CASE tools

Repository DBMS

Database


9/27


2ND



6. User Interface

- languages, menu and other facilities interacted by the users front and support

- use of menu driven system, mouse and voice recognition system to promote end –

user computing – user who are not experts, can define their own report, displays and

application

7. Data Administrators

- persons who are responsible for designing databases and for developing policies

regarding databases security and integrity

- they use CASE tools to improve the productivity of databases planning and design.

8. System Developers

- persons such as system analysts and programmers who design new application

programs.

- They use CASE tools for system requirement, analysis and program design.

9. End Users

- persons through the organization who adds, edits, delete and receive information

- encoders

Lecture no. 2: ENTITY – RELATIONSHIP MODELS

2.1 Entity – Relationship models

ENTITY – RELATIONSHIP MODELS

Relationship between two or more entities.

CATEGORIES OF ASSOCIATION

1. ASSOCIATION BETWEEN DATA ITEMS

Represent the relationship of data item or shows how each data item is related toanother. Each type of data item is represented by an ellipse or bubble with the data item enclosed.

Association between data items is represented by an arrow connecting the data item bubbles.

Example of data items that has - no meaningful association.

STUD # EMPLOY #

Example of data items that has - meaningful association

STUD # STUDNME


10/27


2ND



Types of Association

1. One - association - means that at any point in time, a given value of A has

one and only one value of A, then the value of B is implicitly known. Implicitly known means that it can be understood though not plainly expressed. We represent a one – association with a single – headed arrow.

A BEx.

EMPLOYEE ADDRESS

2. Many – association - means that at any point, a given value of A has one or many values of Bassociated with it. We represent a many – association with a double – headed arrow.

Ex. A B

STUD # SUBJECTS

MULTIVALUED ATTRIBUTE - occurs potentially multiple times for each item of A

3. Conditional Association - with this, for a given value of data item A there are two possibilities: either there is no value of data item B or there is one ( or many ) value (s) of data

item B. A conditional association is represented by a zero recorded on the arrow near theconditional item.

A B

Ex. Conditional item

BED PATIENT

CARDINALITY - term used by the analysts that is represented by the arrow heads and zeros on the

arrows which can be thought of as having minimum and maximum values.

Reverse Association

If there is an association from data item A to data item B, there is also a reverseassociation from B to A.


11/27


2ND



Types of Reverse Association

1. One – to – one associationMeans that at any point in time, each value of data item A is associated with zero or

exactly one of data item B. Conversely, each value of B is associated with one value of A.

A BEx.

STUD # STUDNME

2. One – to – many association

Means that at any point in time, each value of data item A is associated with zero, one ormany values of data item B. However, each value of B is associated with exactly one value of A.

The mapping from B to A is said to be many - to – one, since there may be many values of B

associated with one value of A.

A B

Ex.

STUD # EXAM

3. Many - to – many association

Means that at any point in time, each value of data item A is associated with zero, orone or many values of data item B. Also each value of B is associated with zero, or one or manyvalues of A.

A BEx.

STUD # COURSE

II. ASSOCIATION BETWEEN RECORDS

Shows the relationship between records.

Crow’s Foot - used to distinguish one and many associations between entities and records.

Crow’s Foot Notation - used to represent the association between records.

Types of Association

1. One Association - no crow’s foot ( one - to – one )

HUSBAND STUDENT

Ex.

WIFE GRADE


12/27


2ND



2. Many Association - represented by a crow’s foot

EMPLOYEE STUDENT

Ex.

BENEFICIARY COURSE

DATA MODELS

Representation of the data about entities, events, activities and their associations within the organizations.

CATEGORIES / GROUPS OF DATA MODELS

I. SEMANTIC DATA MODEL

Use of capture all meaning of data and to embed this as integrity and structural clauses in the database

definitions. Such concepts as class, subclass, aggregation, dynamic properties and structures and handling

object of different types ( images, voice print, as well as text and data ) are included in the SDM and other

semantically rich data models.

II. RELATIONAL DATA MODEL

The relational data model uses the concept of a relation to represent what we have previously called a file

that is a relation represents an entity class. A relation is viewed as a two dimensional table.

The choice of many database builders and users is the relational data model. It is different from other

models not only from the architecture but also in the following ways :

1. Implementation Independence - it logically represents all relationships implicitly and hence, one does

not know what associations are or not physically represented by an efficient method. Relational shares this

property with ER – D.

2. Terminology - it uses its own terminology, most of which has equivalent terms in other data models.

3. Logical Key Pointers - it uses primary and secondary keys in records to represent the association

between 2 records, whereas ER – D uses arc between entity boxes.

4. Normalization Theory - properties of database that make it free of certain maintenance problems have

been developed within the context of the relational data model ( although this properties can also be designed

into an ER – D or a network data model ).


13/27


2ND



5. High Level Programming Languages - P. L. have been developed specifically to access database

defined via the relational data model; these languages permit data to be manipulated as groups of files than

procedurally one record at a time.

III. HIERARCHICAL DATA MODEL

Organizations are usually viewed as a hierarchy oppositions and authority. Computer programs can be

viewed as hierarchy of control and operating modules; and various taxonomies of animals and plants view

elements in a hierarchical sets of relationship. The hierarchical data model represents data as a set of nested

one to many relationships, the hierarchical data model is used exclusively with hierarchical database

management systems; since such systems are in general, being phased out.

IV. NETWORK DATA MODEL

The network data model permits as much or as little structure as is desired. We can even create a hierarchy

( a special of a network ) if that is what is needed. As the hierarchical data model, if a certain relationship

is not explicitly included in the database definition, then it cannot be used by a DBMS in processing a

database.

V. ENTITY RELATIONSHIP DATA MODEL ( ER –

DIAGRAM )

It is based on the perception of a real world that consists of a set of basic objects called entities and

relationships among entities / objects. It is a graphical notation that uses special symbols to indicate

relationship among entities intended primarily for the database design process.

Basic Symbols

Entity

Relationship

Data Item

Stands for “is a”

Primary Key

Class - sub - class

Degree

The number of entities that participate in a relationship.

ISA


14/27


2ND



Most Typical Degrees for Relationship

1. Unary Relationship - relationship between instances of the

entity class.

Ex.

2. Binary Relationship - relationship between instances oftwo entity classes.

Ex.

3. Ternary Relationship - relationship among instances of threeentity classes.

Ex.

EMPLOYEE

PERSON

PARENT CHILD

CUSTOMER ORDER

PRODUCT VENDOR

WAREHOUSE


15/27


2ND



SEATWORK:

1.) Give two examples using Unary, Binary and Ternary Relationships

QUIZ # 3

Normalizing a Database

Normalization

- is a process of reducing redundancies of data in a database.

- is a technique that is used when designing and redesigning a database.

- is a process or set of guidelines used to optimally design a database to reduce redundant data.

The Raw Database

A database that is not normalized may include data that is contained in one or more different tables for no apparent

reason. This could be bad for security reasons, disk space usage, speed of queries, efficiency of database updates, and,

maybe most importantly, data integrity. A database before normalization is one that has not been broken down

logically into smaller, more manageable tables.

COMPANY_DATABASE

Emp_id cust_idLast_name cust_nameFirst_name cust_addressMiddle_name cust_cityAddress cust_state

City cust_zipState cust_phoneZip cust_fax

Phone ord_numPager qty

Position ord_dateDate_hire prod_id


16/27


2ND



Logical Database Design

Any database should be designed with the end user in mind. Logical database design, also referred to as the logical

model, is the process of arranging data into logical, organized groups of objects that can easily be maintained. The

logical design of a database should reduce data repetition or go so far as to completely eliminate it. After all, why

store the same data twice? Naming conventions used in a database should also be standard and logical.

What are the End User’s Needs?

The needs of the end user should be one of the top considerations when designing a database. Remember that the end

user is the person who ultimately uses the database. There should be ease of use through the user’s front-end tool (a

client program that allows a user access to a database), but this, along with optimal performance, cannot be achieved if

the user’s needs are not taken into consideration.

Some user-related design considerations include the following:

What data should be stored in the database?

How will the user access the database?

What privileges does the user require?

How should the data be grouped in the database?

What data is the most commonly accessed?

How is all data related in the database?

What measures should be taken to ensure accurate data?

Data redundancy

Data should not be redundant, which means that the duplication of data should be kept to a minimum for several

reasons. For example, it is unnecessary to store an employee’s home address in more than one table. With duplicate

data, unnecessary space is used. Confusion is always a threat when, for instance, an address for an employee in one

table does not match the address of the same employee in another table.

Which table is correct? Do you have documentation to verify the employee’s current address? As if data management

were not difficult enough, redundancy of data could prove to be a disaster.


17/27


2ND



The Normal Forms

Normal form is a way of measuring the levels or depth, to which a database has been normalized. A

database’s level of normalization is determined by the normal form.

The following are the three most common normal forms in the normalization process :

The first normal form

The second normal form

The third normal form

Of the three normal forms, each subsequent normal form depends on normalization steps taken in the

previous normal form. For example, to normalize a database using the second normal form, the database

must first be in the first normal form.

The First Normal Form

The objective of the first normal form is to divide the base data into logical units called tables. When each

table has been designed, a primary key is assigned to most or all tables.

COMPANY _ DATABASEEMPLOYEE _ TBL CUSTOMER _ TBLemp_id emp_id cust_id cust_id

last_name last_name cust_name cust_namefirst_name first_name cust_address cust_addressmiddle_name middle_name cust_city cust_cityaddress address cust_state cust_state

city city cust_zip cust_zipstate state cust_phone cust_phonezip zip cust_fax cust_fax

phone phone ord_num ord_num pager pager qty qty

position position ord_date ord_date position_desc postion_descdate_hire date_hire prod_id

pay_rate pay_rate prod_desc PRODUCTS_TBL bonus bonus cost prod_iddate_last_raise date_last_raise prod_desc

cost

You can see that to achieve the first normal form, data had to be broken into logical units of related

information, each having a primary key and ensuring that there are no repeated groups in any of the tables.

Instead of the large table, there are now smaller, more manageable tables: EMPLOYEE_TBL,

CUSTOMER_TBL and PRODUCTS_TBL. The primary keys are normally the first columns listed in a table,

in this case: EMP_ID and PROD_ID.


18/27


2ND



The Second Normal Form

The objective of the second normal form is to take data that is only partly dependent on the primary key

and enter that data into another table.

EMPLOYEE_TBL EMPLOYEE_TBLemp_id emp_id

last_name last_namefirst_name first_namemiddle_name middle_name

address addresscity city

state state EMPLOYEE_PAY_TBL

zip zip emp_id phone phone position pager pager position_desc position

position_desc date_hiredate_hire pay_rate

pay_rate bonus bonus date_last_raisedate_last_raise

CUSTOMER_TBL

CUSTOMER_TBL cust_id

cust_id cust_namecust_name cust_addresscust_address cust_citycust_city cust_state

cust_state cust_zipcust_zip cust_phonecust_phone cust_faxcust_fax

ORDERS_TBL

ord_num prod_id ord_numqty prod_id

ord_date qtyord_date

FIRST NORMAL FORM SECOND NORMAL FORM

According to the figure, the second normal form is derived from the first normal form by further breaking

two tables down into more specific units.

EMPLOYEE_TBL split into two tables called EMPLOYEE-TBL and EMPLOYEE_PAY_TBL. Personal

employee information is dependent on the primary key


19/27


2ND



( EMP_ID ), so that the information remained in the EMPLOYEE_TBL ( EMP_ID, LAST_NAME,

FIRST_NAME, MIDDLE_NAME, ADDRESS, CITY, STATE, ZIP, PHONE and PAGER. On the other

hand, the information that is only partly dependent on the EMP_ID ( each individual employee ) is used to

populate EMPLOYEE_PAY_TBL ( EMP_ID, POSITION, POSITION_DESC, DATE_HIRE, PAY_RATE

and DATE_LAST_RAISE ). Notice that both tables contain the column EMP_ID. This is the primary key

of each table and is used to match corresponding data between the two tables.

CUSTOMER_TBL split into two tables called CUSTOMER_TBL and ORDERS_TBL. What took place is

similar to what occurred in the EMPLOYEE_TBL. Columns that were partly dependent on the primary key

were directed to another table. The order information for a customer is dependent on each CUST_ID, but

does not directly depend on the general customer information in the original table.

The Third Normal Form

The third normal form’s objective is to remove data in a table that is not dependent on the primary key.

Another table was created to display the use of the third normal form. EMPLOYEE_PAY_TBL is split into

two tables, one table containing the actual employee pay information and the other containing the position

descriptions, which really do not

need to reside in EMPLOYEE-PAY_TBL. The POSITION_DESC column is totally independent of the

primary key, EMP_ID.

EMPLOYEE_PAY_TBL

emp_id

position position_desc

date_hire pay_rate bonus

date_last_raise

EMPLOYEE_PAY_TBL

emp_id POSITIONS_TBL

position positiondate_hire position-desc

pay_rate bonusdate_last_raise


20/27


2ND



Benefits of Normalization

Normalization provides numerous benefits to a database. Some of the major benefits include the following :

- Greater overall database organization

- Reduction of redundant data

- Data consistency within the database design

- A much more flexible database design

- A better handle on database security

.

Drawbacks of Normalization

Although most successful databases are normalized to some degree, there is one substantial drawback of a

normalized database: reduced database performance. The acceptance of reduced performance requires the

knowledge that when a query or transaction request is sent to the database, there are factors involved, such

as CPU usage, memory usage and input/output (I/O). To make a long story short, a normalized database

requires much more CPU, memory and I/O to process transactions and database queries than does a de-

normalized database. A normalized database must locate the requested tables and then join the data from the

tables to either get the requested information or to process the desired data. A more in-depth discussion

concerning database performance occurs in Hour 18, “Managing Database Users. “

QUIZ # 4

MIDTERM EXAMINATION

FINALS PERIOD

Lecture no. 1: TRANSACTION MANAGEMENT

1.1 Transaction Support

Transaction – an action or series of actions, carried out by a single user or application

program, which accesses or changes the contents of the database.

- is a logical unit of work on the database. It may be an entire program, partof a program, or a single command.


21/27


2ND



Properties of Transactions

There are properties that all transactions should possess the four basic, or so called ACID:

Atomicity – the “all or nothing” property. A transaction is an indivisible unit that is either performedin its entirely or it is not performed at all.

Consistency- a transaction must transform the database from one consistent state to anotherconsistent state.

Isolation – transactions execute independently of one another. In other words, the partial effects ofincomplete transactions should not be visible to other transactions.

Durability – the effects of a successfully completed (committed) transaction are permanentlyrecorded in the database and must be lost because of a subsequent failure.

THE DBMS TRANSACTION MANAGEMENT

Transaction Manager – coordinates transactions on behalf of application programs.

Scheduler – the module responsible for implementing a particular strategy for concurrency control.

Sometimes referred to as “ Lock Manager”.

Recovery Manager - ensures that the database is restored to the state it was in before the start of the

transaction, and therefore a consistent file.

Buffer Manager – is responsible for the transfer of data between disk storage and main memory.

Database and System Catalog

Figure 1

The DBMS Transaction Subsystem

Transaction Manager Scheduler

Buffer Manager Recovery Manager

Access Manager

Systems Manager

File Manager


22/27


2ND



1.2 Concurrency Control

Concurrency Control – the process of managing simultaneous operations on the

database without having them interfere with one another.

Major objective in developing a database

“ Is to enable many users to access shared data concurrently”

Three examples of potential problems caused by concurrency:

1.) The Lost Update Problem – an apparently successfully completed updateoperations by one user can be overridden by another user.

2.) The Uncommitted Dependency Problem – occurs when one transaction is allowed to see the

intermediate results of another transaction before it has committed.

3.) The Inconsistent Analysis Problem – occurs when a transaction updates some of them using theexecution of the first.

1.3 Database Recovery

Database Recovery – the process of restoring the database to a correct state in the event of a failure.

Four different types of media with an increasing degree for reliability:

1.) Main Memory – is volatile storage that usually does not survive system crashes.

2.) Magnetic Disks – provide online non-volatile storage. Compared with main memory, disks are morereliable and much cheaper, but slower by three to four order of magnitude.

3.) Magnetic Tape – is an offline non-volatile storage medium, which is far more reliable than disk andfairly inexpensive, but slower, providing only sequential access.

4.) Optical Disks – is more reliable than tape, generally cheaper, faster, providing random access.

Additional Facts:

- Main memory is also called Primary Storage.- Disks and tape are known as Secondary Storage.- Stable storage represents information that has been replicated in several non-volatile storage

media (usually disk) with independent failure modes.


23/27


2ND



Among the causes of failure are:

1.) System crashes – due to hardware or software errors, resulting in loss of main memory.

2.) Media failures – such as head crashes or unreadable media, resulting in the loss of parts ofsecondary storage.

3.) Application software errors – such as logical errors in the program that is accessing the database,which cause one or more transaction to fail.

4.) Natural physical disasters – such as fire, floods, earthquakes, or power failures.

5.) Sabotage – or can be called as intentional corruption or destruction of data, hardware or softwarefacilities.

Two Principal Effects that we need to consider:

1.) The loss of main memory, including the database buffers.2.) The loss of the disk copy of the database.

A DBMS should provide the following facilities to assist with recovery:

A backup mechanism, which makes periodic backup copies of the database.

Logging facilities, which keep track of the current state of transactions and database changes.

A checkpoint facility, which enables updates to the database that are in progress to be made permanent.

A recovery manager, which allows the system to restore the database to a consistent state following a

failure.

Log File

To keep track of database transactions, the DBMS maintains a special file called a log

(or journal) that contains information about all updates to the database.

The log may contain the following data:

1.) Transaction Records, containing:

Transaction identifier. Type of log record.

Identifier of data item affected by the database action.

2.) Checkpoint records

Checkpoint – the point of synchronization between the database and the

transaction log file. All buffers are force-written to secondary storage.


24/27


2ND



Checkpoint are scheduled at predetermined intervals and involve the following

operations:

Writing all log records in main memory to secondary storage.

Writing the modified blocks in the database buffers to secondary storage. Writing a checkpoint record to the log file. This record contains the identifiers of all transactions that

are active at the time of the checkpoint.

Recovery Techniques

1.) Recovery techniques using deferred update

When a transaction starts, write a transaction start record to the log.

When any write operation is performed, write a log record containing all the data specified previously(excluding the before-image of the update). Do not actually write the update to the database buffers

or the database itself.

When a transaction is about to commit, write a transaction commit log record, write all the logrecords for the transaction to disk and then commit the transaction. Use the log records to perform the

actual updates to the database.

If a transaction aborts, ignore the log records for the transaction and do not perform the writes.

Any transaction with transaction start and transaction commit log records should be redone.

For any transactions with transaction start and transaction abort log records, we do nothing, since noactual writing was done to the database, so these transactions do not have to be undone.

2.) Recovery techniques using immediate update

When a transaction starts, write a transaction start record to the log.

When a write operation is performed, write a record containing the necessary data to the log file.

Once the log record is written, write the update to the database buffers.

The updates to the database itself are written when the buffers are next flushed to secondarystorage.

When a transaction commits, write a transaction commit to the log.

Lecture no. 2: IMPROVING QUERY PERFORMANCE

2.1 Hash Files

Hash function – calculates the address of the page in which the record is to be stored based on one or more of the fields in the record.


25/27


2ND



Hash field – is also called as base field.

Hash key – if the field is also a key field of the file.

Collision – when the same address is generated for two or more records.

There are several techniques that can be used to manage collisions:

1.) Open addressing2.) Unchained overflow3.) Chained overflow4.) Multiple hashing

2.2 Indexes

Indexes – a data structure that allows the DBMS to locate particular records in a file more quickly, and

thereby speed response to user queries.

Data file – the file containing the logical records.

Index file – the file containing the index records.

Primary index – if the data file is sequentially ordered, and the indexing field is a key field of

the file, it is guaranteed to have a unique value in each record.

Clustering index – if the indexing is not a key field of the file, so that there can be more thanone record corresponding to a value of the indexing field.

Secondary index – an index that is defined on a non-ordering field of the data file.

Indexed sequential file – a sorted data file with a primary index.

An Indexed sequential file is a more versatile structure, which normally has:

A primary storage area.

A separate index or indexes.

An overflow area.

SEATWORK:

1.) Define Transaction2.) Give other Database recovery techniques


26/27


2ND



QUIZ #5

Lecture no. 3: DATA WAREHOUSING, OLAP and DATA MINING

3.1 Data Warehousing

Data Warehousing - is a subject-oriented, integrated, time-variant, and non-volatilecollection of data in support of management’s decision-making process.

Benefits of Data Warehousing

Potential high returns on investment

Competitive advantage

Increased productivity of corporate decision-makers

Problems of Data Warehousing

Underestimation of resources for data loading

Hidden problems with source systems

Required data not captured

Increased end-user demands

Data homogenization

High demand for resources

Data ownership

High maintenance

Long duration projects Complexity of integration

The Major components of a Data Warehouse

1. Operational Data

The source of data for the data warehouse is supplied from:

Mainframe operational held in first generation hierarchical and network databases.

Departmental data held in propriety file systems such as VSAM, RMS, and relational DBMS

such as Informix, Oracle. Private data held on workstations and private servers.

External systems such as the Internet, commercially available databases, or databases associatedwith an organization’s supplier or customers.

2. Load Manager – (also called the front-end component) performs all the operations associated withthe extraction and loading of data into the warehouse.

3. Warehouse Manager – performs all the operations associated with the management of the data inthe warehouse.

4. Query Manager - (also called the back-end component) performs all the operations associated withthe management with the management of user queries.

5. End-user access tools – is to provide information to business users for strategic decision making.


27/27


2ND


Can be categorized into five main groups:

Reporting and query tools.

Application development tools.

Executive information system (EIS) tools.

Online analytical processing (OLAP) tools.

Data mining tools.

3.2 Online Analytical Processing (OLAP) – the dynamic synthesis, analysis, and consolidation of largevolumes of multi-dimensional data.

Rules for OLAP Systems

Multi-dimensional conceptual view

Transparency

Accessibility

Consistent reporting performance

Client-server architecture

Generic dimensionality

Dynamic sparse matrix handling

Multi-user support

Unrestricted cross-dimensional operations

Intuitive data manipulation

Flexible reporting

Unlimited dimensions and aggregations

Categories of OLAP Tools

1. Multi-dimensional OPLAP ( MOLAP or MD-OPLAP ) – use specialized data structures andMulti-dimensional Database Management (MDDBMSs) to organize, navigate, and analyze

data.

2. Relational OPLAP ( ROLAP ) – is the fastest-growing style of OLAP technology. ROLAPsupports RDBMS products through the use of a meta-data layer, thus avoiding the

requirement to create a static multi-dimensional data structure.

3. Managed Query Environment (MQE) – they provide limited analysis capability, eitherdirectly against RDBMS products, or by using an intermediate MOLAP server.

3.3 Data Mining

Data Mining – the process of extracting valid, previously unknown, comprehensible, and actionable

information from large databases using it to make crucial business decisions.

Four Main Operations Associated with Data Mining Techniques:

1. Predictive modeling 3. Link analysis2. Database segmentation 4. Deviation detection

QUIZ #6

FINAL EXAMINATION