MELJUN CORTES DATABASE System Instructional Manual

Embed Size (px)

Citation preview

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    1/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  1 

    PRELIM PERIODLecture no. 1: DATABASE SYSTEMS1.1 Introduction to Database SystemsDatabase (DB) - An integrated collection of related data

    By related  data we mean that the data represents logically coherent facts about some aspects of the real world

    that are required by an application U niverse of discourse or mini-world - The part of the real world that a database is designed to model within

    a computerBy integrated  we mean that the data for multiple applications is stored together and manipulated in a uniform

    way   on a secondary storage such as a magnetic or an optical disk. The primary goal of integration is to supportinformation sharing across multiple applications.

    a Database System consists of 1) an application specific database, 2) the DBMS that maintains that database,and 3) the application software that manipulates the database

    Database Systems and Database Management Systems A Database Management System (DBMS ) is a collection of programs that controls a database. Specifically,

    it provides us with an interface to create, maintain, and manipulate multiple databasesDBMS is a general-purpose software system that we can use not only to create and maintain multiple

    databases but also to implement database systems for different applications as well. As opposed to a DBMS, which is general-purpose, a database system is developed to support the operations

    of a specific organization or a specific set of applications.THE DATABASE APPROACHES(Ways of Handling Databases)

    1. Manual  – manual manipulation of dataEx. Manual card catalog

    2. Computerized  – electronic data handlingTraditional File Processing System (TFPS)Database Management System (DBMS) 

    DBMS vs TFPSTFPS - application programs directly  (filenames and data definitions are embedded in each program.)

    -data are integrated in a single, shared data file, all application programs that share the data file mustbe aware of all the data in the file, including those data items that they do not make use of or need toknow- The problem gets worse when a new field is added to a data file

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    2/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  2 

    Disadvantages of TFPS

    1. Uncontrolled Redundancy2. Inconsistent Data3. Inflexibility4. Limited Data Sharing5. Poor Enforcement of Standards6. Low Programmer Productivity7. Excessive Program Maintenance

    DBMS - stores the structure of the data as part of the description of the database in the system catalog,separately from the application programs

    Characteristics of DBMS

    Data Abstraction

    DBMSs allow data to be structured in ways that make it more understandable and meaningful to theapplications than the ways data are physically stored on disks. They provide users with high-level, conceptualreC:\Documents and Settings\Arnel & Maegen\My Documents\CSCI12_lecedited.doc presentations of the data—atable in relational DBMSs, or an object in object-oriented DBMSs, to give two examples—while they hide storagedetails that are not of interest to most database users.

     program-data independence - the physical organization of data can be changed without affecting theapplication programs

     program-operation independence- the implementation of abstract operations can be changed without affectingthe code of the application programs, - as long as their calling interface stays the same

    Data abstraction and, in particular, data independence is what facilitates data sharing and integration. Thisis the main advantage of DBMS against Traditional File processing whose application programs depend on the low-level structure of the data or storage organization, each program stores its data in a separate data file

    Reliability

    DBMSs provide high reliability by 1) enforcing integrity constraints and 2) ensuring data consistency  despite hardwareor software failures.

    Integrity constraints  reflect the meaning (or, the semantics) of the data and of the application ( ex. Datatype)

    Constraints  – conditions, restrictionsData consistency  that is, interrupted update operations do not corrupt the database with values that violate

    the integrity constraints and no data in the database is lost.

     After a failure, a DBMS automatically recovers, restoring the database to the consistent state in whichit existed just prior to the interruption. This consistent state is constructed as follows. During recovery, a DBMSrolls back  all interrupted transactions, obliterating their updates from the database, and re-executessuccessfully terminated transactions as necessary, restoring their updates in the database

    Efficiency

    DBMSs support both efficient space utilization and efficient access to data. By making use of the data description inthe catalog, DBMSs are able to minimize data redundancy, which in turn saves both space, by storing each data itemonly once, and processing time, by eliminating the need of multiple updates to keep the replicas consistent and up-to-date.DBMSs enhance the performance of queries by means of optimizations and the use of access methods to data basedon their values. Optimizations simplify the queries so that they can execute faster, 

    and access methods allow direct access to locations where relevant data are stored, in a way similar to the accessprovided by the index in the back of a book.DBMSs decrease response time of transactions by allowing multiple users to access the database

    concurrently

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    3/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  3 

    1.2 Relational DatabasesRelational Database Schema

     A relational database schema is a set of table schemas and a set of integrity constraints. Integrity constraints canbe sorted into two kinds:

      structural (model-specific) integrity constraints that are imposed by the model as discussed below, and

      semantic (application-specific) integrity constraints imposed by the application, such as the constraint, forexample, that the balance of a savings account cannot be negative.

    Keys - Keys are columns whose values are sufficient to uniquely identify a row TYPES OF KEYS1. Primary Key – uniquely identifies a record2. Secondary Key – used to access a group of records with common attributes3. Alternate Key – candidate to be Primary Key4. Composite Key – composed of two or more columns to access a unique record5. Foreign Key - a non- key attribute (ordinary column ) in one table, but a primary key in another.

    - establishes association(relationships) among tables within one database (in a relationaldatabase schema)

    DDL (Data Definition Language)

    The command to create a table in SQL is the CREATE TABLE command. SQL supports all the basicdata types found in most programming languages: integer, float, character, and character string. SQLcommands are not case sensitive.

    CREATE TABLE MEMBER(

    MemNo integer(4),

    DriverLic integer,Fname char(10),MI char,Lname char(15),PhoneNumber char(14),PRIMARY KEY (MemNo),UNIQUE (DriverLic)

    );

    The primary key is specified using the PRIMARY KEY directive, alternate keys using the UNIQUE directiveDML (Data Manipulation Lanaguage)

    Update Operations

    Relational DML allows us to insert  and delete rows in a table as well as to update the values of one or morecolumns in a row.In SQL, only one row can be inserted at a time, by specifying the values of each column, as in the following

    example:INSERT INTO MEMBERVALUES (101, 6876588, 'Susan', W, 'Jones', '412-376-8888');

    This statement inserts a new row for Susan W. Jones in the MEMBER table. In SQL, strings are enclosedwithin single quotes.Delete and update can be applied to multiple rows that satisfy a selection condition. In SQL, a selection condition in adeletion is specified by a WHERE clause. In the simplest case, a row is selected by specifying the value of its primarykey. For example, the statementDELETE FROM MEMBERWHERE MemNo = 102;deletes the row with member number 102 from the MEMBER table. The following statement changes the middle initialof the member 101 in the MEMBER table.UPDATE MemberSET MI = SWHERE MemNo = 101;

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    4/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  4 

     An update operation succeeds if it does not violate any integrity constraints. For example, an insert operationwill not succeed if it attempts to insert a row whose keys, primary and alternate, conflict with existing keys. That is, ifthe row were to be inserted, the property that keys should be unique would be violated. On the other hand, deleting arow never violates a key constraint, unless the deleted row is referenced by a foreign key. In that case, deleting a rowmight violate a referential integrity constraint

    TOOLS FOR QUERIES1) QBE (Query By Example)Query By Example (QBE) is another visual query language developed by IBM [Zloof, 1977] to simplify an

    average end-user's task of retrieving data from a database. QBE saves the user from having to remember the namesof tables and columns, and the precise syntax of a query language. The basic idea is to retrieve data with the help ofquery templates.

    QBE works as follows:

      the system provides the user with a skeleton or query template of the tables in the database, and

      the user fills in the tables with examples of what is to be retrieved or updated.

     A skeleton of a table is basically a copy of the table without any rows, i.e. an empty table. For simple selectionconditions, the examples can be constant values, such as Susan and 100, or comparisons with constant values suchas 100, specified under a column

    Projection in QBEProjection is specified by selecting the show button associated with each field, which we denote in our

    example with "P.". To print all columns of retrieved tuples, we only need to put one "P." under the name of the table.EX. displays MemNo, Lname, and PhoneNumber from MEMBER:

    QBE1:MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|

    P. P. P.

    The result of a query is displayed in a result table, which subsequently can be either stored or manipulatedfurther. In Microsoft Access, the resulting table is called a datasheet .Selection in QBEQBE2: Retrieve all members whose first name is John.

    MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|P. John

    By placing P. under the table name, this will retrieve and display the data in all the columns.

    QBE3: Retrieve the name and member number of all the members whose member number is greater than100.

    MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|>100 P. P.

    Comparison with constant value (in the above example the constant value is 100) is placed in the appropriatecolumn. The resulting table will have the following columns:

    Result| MemNo | Fname | Lname |In QBE, a disjunction (OR) is expressed by using different examples in different rows of the skeleton.

    QBE4: Retrieve the name and member number of all the members whose first name is John or Susan.

    MEMBER |MemNo| DriverLic| Fname| MI| Lname| Address| PhoneNumber|P. P.John P.P. P.Susan P.

     A conjunction (AND), on the other hand, is expressed in the same row.

    QBE5: Retrieve the name and member number of all the members whose first name is Susan and whose member

    number is greater than 100.

    MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|P.>100 P.Susan P.

    If the conjunction is a condition involving a single column, the condition can be specified using the ANDoperator, as in SQL. For example, if the MemNo should be greater than 100 and less than 150, this is specified underthe MemNo column as: ( _x > 100 ) AND ( _x < 150 )

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    5/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  5 

    Join in QBEJoins can be expressed by using common example variables in multiple tables in the columns to be joined.

    QBE6: List the member number and last name of all the members who currently have a borrowed book.

    MEMBER |MemNo | DriverLic| Fname| MI| Lname| Address| PhoneNumber|P._join P.

    BOOK|Book_id|CallNumber|Edition|BorrowerMemNo|BorrowDueDate| _join

    To express multiple joins you can use multiple example variables at the same time.

    SEATWORK:1.What is a Relational Database?2.Enumerate the different types of Keys and give an exampleQUIZ # 1

    Lecture no. 2: Complete SQL 2.1 SQLStructured Query LanguageSQL – is the de-facto standard query language for relational DBMS.

    - is a comprehensive language providing statements for both data definition and data manipulation.SQL DDL – (Data Definition Language)

    - Provides basic commands for defining the conceptual schema of a database.SQL Provides 3 Numeric data types:

    1.) Exact Number – These are integers or whole numbers which maybe positive or negative or zero.SQL Support 2 integer types:

    1.) Integer (INT)2.) SMALLINT

    2.) Approximate number  –  these are numbers that cannot be represented exactly, such as real numbers andfractional types.

    3.) Formatted Number – theses are numbers stored in decimal notation.Formatted numbers can be defined using the ff:

    1.) Decimal (ij)2.) DEC (ij)3.) Numeric (ij)Where: I = is the precision on the total number of digits excluding decimal point.

    J = is the scale, on the number of fractional digits.Default scale is zero (0)

    Syntax in creating a database name in SQL Query analyzer

    1.) CREATE DATABASE USE

    2.) CREATE ON(NAME = DATA FILE NAME FILENAME = “”) 

    3.) CREATE TABLE ( () PRIMARY KEY, () );

    4.) INSERT [INTO] [(column_list)]VALUES (data_values)

    5.) SELECT * FROM

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    6/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  6 

    SPECIFIC RELATIONAL OPERATIONS1.) Projection Operation (Π) 

    - Selects the attributes or an attribute list from a table r, while discarding the vest.2.) Selection Operation (б)

    - Selects some rows in attribute r that satisfy a selection condition (alias predicate).

    3.) Join Operation- Combines two tables in one, there by allowing us to obtain more information.

    SEATWORK:1.) What is SQL and it’s capabilities? 2.) Create a Database, Insert values and view the data inserted.QUIZ # 2PRELIM EXAMINATION

    MIDTERM PERIOD

    Lecture no. 1: DATABASE DESIGN 1.1 Database System designCOST OF DATABASE APPROACH

    If you are to implement DBMS in an organization you need to consider these 4 things :

    1. New personalized personnel-  organization should have or train individuals to :

    a.  maintain the new database software b.  develop and enforce new programming standardsc.  design databasesd.  manage the staff of new people to train the new employees

    -  this personnel will increase or may increase productivity (should not

    minimize skills )

    2.  Need to explicit back –  up

    -   provide back –   up copies of data because :

    a.  it is helpful in restoring damaged data files

     b.   provides validity checks on crucial data

    3.  Interference with shared data

    -  concurrent access to shared data via several application program problems

    a.  when 2 concurrent users both want to change the same or related data inaccurate results

    can occur if access to data is not properly synchronized.

     b.  When data are used exclusively for updating, different users can obtain control of different

    segments of the database to lock –  up any use of the data.

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    7/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  7 

    Organizational Conflict

    - a shared database requires a consensus of data definition

    a. conflicts on how to define data length and coding rights to update shared data and

    associated issues.

    TYPES OF DATABASE

    1.  Operational Database

    -  contains business transaction and history of daily business activities

    -  used to support the on –  going daily activities of the organization

    -  use on the “ Transaction Processing System “ Ex. Customers orders, purchases,

    accounting, shipments and payments

    2.  Managerial Database

    -  used by middle managers for planning control, summaries of operational database

    -  summary of operation

    -  use on “ Management Information System “

    3.  Strategic Database

    -  used by senior managers to develop corporate strategies and seek competitive

    advantage

    -  contains information on competitors to economic factors as well as corporate

    information

    -  used on “ Decision Support System “ 

    GENERIC TYPES OF DATABASE APPLICATION

    1.  Data Capture

    -  captures transaction data, populate databases and maintain the currency of data, gather

    data

    2.  Data Transfer

    -  moves / transfers data from one database to another

    -  Ex. From operational to managerial

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    8/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  8 

    3.  Data Distribution

    -  application resulting from data analysis

    -  converts data into a readily useful information and present them to the management

    in a readily understandable form

    -  Ex. Report, summary and graphs

    COMPONENTS OF DATABASE ENVIRONMENT

    1. CASE Tools

    - Computer –  Aided Software Engineering ( CASE ) tools

    - Automated tools used to design databases and application program

    2. Repository

    -  centralized knowledge base containing all data definitions, screen and report formats

    and definitions of other organizations and system components containing definitions of

    data format

    3. DBMS  - commercial software system used to provide access to the database and repository

    4.  Database

    -  an integrated collection of data, organized to meet the information needs of multiple

    users in an organization

    -  contains occurrences of data ( value itself

    5.  Application Programs

    -  computer programs are used to create and maintain the database and provide

    information to users

    DATA ADMINISTRATORS SYSTEM DEVELOPERS END USERS

    Applicationuser interface CASE tools 

    Repository DBMS 

    Database 

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    9/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  9 

    6.  User Interface

    -  languages, menu and other facilities interacted by the users front and support

    -  use of menu driven system, mouse and voice recognition system to promote end –  

    user computing –  user who are not experts, can define their own report, displays and

    application

    7. Data Administrators

    -   persons who are responsible for designing databases and for developing policies

    regarding databases security and integrity

    -  they use CASE tools to improve the productivity of databases planning and design.

    8. System Developers

    -   persons such as system analysts and programmers who design new application

     programs.

    -  They use CASE tools for system requirement, analysis and program design.

    9.  End Users

    -   persons through the organization who adds, edits, delete and receive information

    -  encoders

    Lecture no. 2:  ENTITY –  RELATIONSHIP MODELS 

    2.1 Entity –  Relationship models

    ENTITY –  RELATIONSHIP MODELS

    Relationship between two or more entities.

    CATEGORIES OF ASSOCIATION

    1.  ASSOCIATION BETWEEN DATA ITEMS

    Represent the relationship of data item or shows how each data item is related toanother. Each type of data item is represented by an ellipse or bubble with the data item enclosed.

    Association between data items is represented by an arrow connecting the data item bubbles.

    Example of data items that has - no meaningful association.

    STUD # EMPLOY #

    Example of data items that has - meaningful association

    STUD # STUDNME

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    10/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  10 

    Types of Association

    1. One - association  - means that at any point in time, a given value of A has

    one and only one value of A, then the value of B is implicitly known. Implicitly known means that it can be understood though not plainly expressed. We represent a one –   association with a single –  headed arrow.

    A BEx.

    EMPLOYEE ADDRESS

    2. Many –  association - means that at any point, a given value of A has one or many values of Bassociated with it. We represent a many –   association with a double –  headed arrow.

    Ex. A B

    STUD # SUBJECTS

    MULTIVALUED ATTRIBUTE  - occurs potentially multiple times for each item of A

    3. Conditional Association - with this, for a given value of data item A there are two possibilities: either there is no value of data item B or there is one ( or many ) value (s) of data

    item B. A conditional association is represented by a zero recorded on the arrow near theconditional item.

    A B

    Ex. Conditional item

    BED PATIENT

    CARDINALITY - term used by the analysts that is represented by the arrow heads and zeros on the

    arrows which can be thought of as having minimum and maximum values.

    Reverse Association

    If there is an association from data item A to data item B, there is also a reverseassociation from B to A.

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    11/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  11 

    Types of Reverse Association

    1.  One –  to –  one associationMeans that at any point in time, each value of data item A is associated with zero or

    exactly one of data item B. Conversely, each value of B is associated with one value of A.

    A BEx.

    STUD # STUDNME

    2.  One –  to –  many association

    Means that at any point in time, each value of data item A is associated with zero, one ormany values of data item B. However, each value of B is associated with exactly one value of A.

    The mapping from B to A is said to be many - to –  one, since there may be many values of B

    associated with one value of A.

    A B

    Ex.

    STUD # EXAM

    3.  Many - to –  many association

    Means that at any point in time, each value of data item A is associated with zero, orone or many values of data item B. Also each value of B is associated with zero, or one or manyvalues of A.

    A BEx.

    STUD # COURSE

    II. ASSOCIATION BETWEEN RECORDS

    Shows the relationship between records.

    Crow’s Foot - used to distinguish one and many associations between entities and records.

    Crow’s Foot Notation - used to represent the association between records.

    Types of Association

    1. One Association - no crow’s foot ( one - to –  one )

    HUSBAND STUDENT

    Ex.

    WIFE GRADE

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    12/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  12 

    2. Many Association - represented by a crow’s foot 

    EMPLOYEE STUDENT

    Ex.

    BENEFICIARY COURSE

    DATA MODELS

    Representation of the data about entities, events, activities and their associations within the organizations.

    CATEGORIES / GROUPS OF DATA MODELS

    I. SEMANTIC DATA MODEL

    Use of capture all meaning of data and to embed this as integrity and structural clauses in the database

    definitions. Such concepts as class, subclass, aggregation, dynamic properties and structures and handling

    object of different types ( images, voice print, as well as text and data ) are included in the SDM and other

    semantically rich data models.

    II. RELATIONAL DATA MODEL

    The relational data model uses the concept of a relation to represent what we have previously called a file

    that is a relation represents an entity class. A relation is viewed as a two dimensional table.

    The choice of many database builders and users is the relational data model. It is different from other

    models not only from the architecture but also in the following ways :

    1. Implementation Independence  - it logically represents all relationships implicitly and hence, one does

    not know what associations are or not physically represented by an efficient method. Relational shares this

     property with ER –  D.

    2. Terminology  - it uses its own terminology, most of which has equivalent terms in other data models.

    3. Logical Key Pointers  - it uses primary and secondary keys in records to represent the association

     between 2 records, whereas ER –  D uses arc between entity boxes.

    4. Normalization Theory  - properties of database that make it free of certain maintenance problems have

     been developed within the context of the relational data model ( although this properties can also be designed

    into an ER –   D or a network data model ).  

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    13/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  13 

    5. High Level Programming Languages  - P. L. have been developed specifically to access database

    defined via the relational data model; these languages permit data to be manipulated as groups of files than

     procedurally one record at a time.

    III. HIERARCHICAL DATA MODEL

    Organizations are usually viewed as a hierarchy oppositions and authority. Computer programs can be

    viewed as hierarchy of control and operating modules; and various taxonomies of animals and plants view

    elements in a hierarchical sets of relationship. The hierarchical data model represents data as a set of nested

    one to many relationships, the hierarchical data model is used exclusively with hierarchical database

    management systems; since such systems are in general, being phased out.  

    IV. NETWORK DATA MODEL 

    The network data model permits as much or as little structure as is desired. We can even create a hierarchy

    ( a special of a network ) if that is what is needed. As the hierarchical data model, if a certain relationship

    is not explicitly included in the database definition, then it cannot be used by a DBMS in processing a

    database.

    V. ENTITY RELATIONSHIP DATA MODEL ( ER – 

     DIAGRAM )

    It is based on the perception of a real world that consists of a set of basic objects called entities and

    relationships among entities / objects. It is a graphical notation that uses special symbols to indicate

    relationship among entities intended primarily for the database design process.

    Basic Symbols

    Entity

    Relationship

    Data Item

    Stands for “is a” 

    Primary Key

    Class - sub - class

    Degree

    The number of entities that participate in a relationship.

    ISA

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    14/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  14 

    Most Typical Degrees for Relationship

    1. Unary Relationship  - relationship between instances of the

    entity class.

    Ex.

    2. Binary Relationship  - relationship between instances oftwo entity classes.

    Ex.

    3. Ternary Relationship  - relationship among instances of threeentity classes.

    Ex.

    EMPLOYEE

    PERSON

    PARENT CHILD

    CUSTOMER ORDER

    PRODUCT VENDOR

    WAREHOUSE

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    15/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  15 

    SEATWORK:

    1.) Give two examples using Unary, Binary and Ternary Relationships

    QUIZ # 3

    Normalizing a Database

    Normalization

    - is a process of reducing redundancies of data in a database.

    - is a technique that is used when designing and redesigning a database.

    - is a process or set of guidelines used to optimally design a database to reduce redundant data.

    The Raw Database

    A database that is not normalized may include data that is contained in one or more different tables for no apparent

    reason. This could be bad for security reasons, disk space usage, speed of queries, efficiency of database updates, and,

    maybe most importantly, data integrity. A database before normalization is one that has not been broken down

    logically into smaller, more manageable tables.

    COMPANY_DATABASE

    Emp_id cust_idLast_name cust_nameFirst_name cust_addressMiddle_name cust_cityAddress cust_state

    City cust_zipState cust_phoneZip cust_fax

    Phone ord_numPager qty

    Position ord_dateDate_hire prod_id

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    16/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  16 

    Logical Database Design

    Any database should be designed with the end user in mind. Logical database design, also referred to as the logical

    model, is the process of arranging data into logical, organized groups of objects that can easily be maintained. The

    logical design of a database should reduce data repetition or go so far as to completely eliminate it. After all, why

    store the same data twice? Naming conventions used in a database should also be standard and logical.

    What are the End User’s Needs? 

    The needs of the end user should be one of the top considerations when designing a database. Remember that the end

    user is the person who ultimately uses the database. There should be ease of use through the user’s front-end tool (a

    client program that allows a user access to a database), but this, along with optimal performance, cannot be achieved if

    the user’s needs are not taken into consideration. 

    Some user-related design considerations include the following:

      What data should be stored in the database?

      How will the user access the database?

      What privileges does the user require?

      How should the data be grouped in the database?

      What data is the most commonly accessed?

      How is all data related in the database?

      What measures should be taken to ensure accurate data?

    Data redundancy

    Data should not be redundant, which means that the duplication of data should be kept to a minimum for several

    reasons. For example, it is unnecessary to store an employee’s home address in more than one table. With duplicate

    data, unnecessary space is used. Confusion is always a threat when, for instance, an address for an employee in one

    table does not match the address of the same employee in another table.

    Which table is correct? Do you have documentation to verify the employee’s current address? As if data management

    were not difficult enough, redundancy of data could prove to be a disaster.

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    17/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  17 

    The Normal Forms

     Normal form is a way of measuring the levels or depth, to which a database has been normalized. A

    database’s level of normalization is determined by the normal form.  

    The following are the three most common normal forms in the normalization process :

      The first normal form

      The second normal form

      The third normal form

    Of the three normal forms, each subsequent normal form depends on normalization steps taken in the

     previous normal form. For example, to normalize a database using the second normal form, the database

    must first be in the first normal form.

    The First Normal Form

    The objective of the first normal form is to divide the base data into logical units called tables. When each

    table has been designed, a primary key is assigned to most or all tables.

    COMPANY _ DATABASEEMPLOYEE _ TBL CUSTOMER _ TBLemp_id emp_id cust_id cust_id

    last_name last_name cust_name cust_namefirst_name first_name cust_address cust_addressmiddle_name middle_name cust_city cust_cityaddress address cust_state cust_state

    city city cust_zip cust_zipstate state cust_phone cust_phonezip zip cust_fax cust_fax

     phone phone ord_num ord_num pager pager qty qty

     position position ord_date ord_date position_desc postion_descdate_hire date_hire prod_id

     pay_rate pay_rate prod_desc PRODUCTS_TBL bonus bonus cost prod_iddate_last_raise date_last_raise prod_desc

    cost

    You can see that to achieve the first normal form, data had to be broken into logical units of related

    information, each having a primary key and ensuring that there are no repeated groups in any of the tables.

    Instead of the large table, there are now smaller, more manageable tables: EMPLOYEE_TBL,

    CUSTOMER_TBL and PRODUCTS_TBL. The primary keys are normally the first columns listed in a table,

    in this case: EMP_ID and PROD_ID.

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    18/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  18 

    The Second Normal Form

    The objective of the second normal form is to take data that is only partly dependent on the primary key

    and enter that data into another table.

    EMPLOYEE_TBL EMPLOYEE_TBLemp_id emp_id

    last_name last_namefirst_name first_namemiddle_name middle_name

    address addresscity city

    state state EMPLOYEE_PAY_TBL

    zip zip emp_id phone phone position pager pager position_desc position

     position_desc date_hiredate_hire pay_rate

     pay_rate bonus bonus date_last_raisedate_last_raise

    CUSTOMER_TBL

    CUSTOMER_TBL cust_id

    cust_id cust_namecust_name cust_addresscust_address cust_citycust_city cust_state

    cust_state cust_zipcust_zip cust_phonecust_phone cust_faxcust_fax

    ORDERS_TBL

    ord_num prod_id ord_numqty prod_id

    ord_date qtyord_date

    FIRST NORMAL FORM SECOND NORMAL FORM

    According to the figure, the second normal form is derived from the first normal form by further breaking

    two tables down into more specific units.

    EMPLOYEE_TBL split into two tables called EMPLOYEE-TBL and EMPLOYEE_PAY_TBL. Personal

    employee information is dependent on the primary key

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    19/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  19 

    ( EMP_ID ), so that the information remained in the EMPLOYEE_TBL ( EMP_ID, LAST_NAME,

    FIRST_NAME, MIDDLE_NAME, ADDRESS, CITY, STATE, ZIP, PHONE and PAGER. On the other

    hand, the information that is only partly dependent on the EMP_ID ( each individual employee ) is used to

     populate EMPLOYEE_PAY_TBL ( EMP_ID, POSITION, POSITION_DESC, DATE_HIRE, PAY_RATE

    and DATE_LAST_RAISE ). Notice that both tables contain the column EMP_ID. This is the primary key

    of each table and is used to match corresponding data between the two tables.

    CUSTOMER_TBL split into two tables called CUSTOMER_TBL and ORDERS_TBL. What took place is

    similar to what occurred in the EMPLOYEE_TBL. Columns that were partly dependent on the primary key

    were directed to another table. The order information for a customer is dependent on each CUST_ID, but

    does not directly depend on the general customer information in the original table.

    The Third Normal Form

    The third normal form’s objective is to remove data in a table that is not dependent on the primary key.  

    Another table was created to display the use of the third normal form. EMPLOYEE_PAY_TBL is split into

    two tables, one table containing the actual employee pay information and the other containing the position

    descriptions, which really do not

    need to reside in EMPLOYEE-PAY_TBL. The POSITION_DESC column is totally independent of the

     primary key, EMP_ID.

    EMPLOYEE_PAY_TBL

    emp_id

     position position_desc

    date_hire pay_rate bonus

    date_last_raise

    EMPLOYEE_PAY_TBL

    emp_id POSITIONS_TBL

     position positiondate_hire position-desc

     pay_rate bonusdate_last_raise

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    20/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  20 

    Benefits of Normalization

     Normalization provides numerous benefits to a database. Some of the major benefits include the following :

    -  Greater overall database organization

    -  Reduction of redundant data

    -  Data consistency within the database design

    -  A much more flexible database design

    -  A better handle on database security

    .

    Drawbacks of Normalization

    Although most successful databases are normalized to some degree, there is one substantial drawback of a

    normalized database: reduced database performance. The acceptance of reduced performance requires the

    knowledge that when a query or transaction request is sent to the database, there are factors involved, such

    as CPU usage, memory usage and input/output (I/O). To make a long story short, a normalized database

    requires much more CPU, memory and I/O to process transactions and database queries than does a de-

    normalized database. A normalized database must locate the requested tables and then join the data from the

    tables to either get the requested information or to process the desired data. A more in-depth discussion

    concerning database performance occurs in Hour 18, “Managing Database Users. “ 

    QUIZ # 4

    MIDTERM EXAMINATION 

    FINALS PERIOD

    Lecture no. 1: TRANSACTION MANAGEMENT

    1.1  Transaction Support

    Transaction  –  an action or series of actions, carried out by a single user or application

     program, which accesses or changes the contents of the database.

    - is a logical unit of work on the database. It may be an entire program, partof a program, or a single command.

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    21/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  21 

    Properties of Transactions

    There are properties that all transactions should possess the four basic, or so called ACID:

      Atomicity –  the “all or nothing” property. A transaction is an indivisible unit that is either performedin its entirely or it is not performed at all.

      Consistency- a transaction must transform the database from one consistent state to anotherconsistent state.

      Isolation –  transactions execute independently of one another. In other words, the partial effects ofincomplete transactions should not be visible to other transactions.

      Durability –  the effects of a successfully completed (committed) transaction are permanentlyrecorded in the database and must be lost because of a subsequent failure.

    THE DBMS TRANSACTION MANAGEMENT

    Transaction Manager  –  coordinates transactions on behalf of application programs.

    Scheduler –  the module responsible for implementing a particular strategy for concurrency control.

    Sometimes referred to as “ Lock Manager”. 

    Recovery Manager - ensures that the database is restored to the state it was in before the start of the

    transaction, and therefore a consistent file.

    Buffer Manager  –  is responsible for the transfer of data between disk storage and main memory.

    Database and System Catalog

    Figure 1

    The DBMS Transaction Subsystem

    Transaction Manager Scheduler

    Buffer Manager Recovery Manager

    Access Manager

    Systems Manager

    File Manager

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    22/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  22 

    1.2  Concurrency Control

    Concurrency Control –  the process of managing simultaneous operations on the

    database without having them interfere with one another.

    Major objective in developing a database

    “ Is to enable many users to access shared data concurrently” 

    Three examples of potential problems caused by concurrency:

    1.)  The Lost Update Problem  –  an apparently successfully completed updateoperations by one user can be overridden by another user.

    2.)  The Uncommitted Dependency Problem –  occurs when one transaction is allowed to see the

    intermediate results of another transaction before it has committed.

    3.)  The Inconsistent Analysis Problem  –  occurs when a transaction updates some of them using theexecution of the first.

    1.3  Database Recovery 

    Database Recovery –  the process of restoring the database to a correct state in the event of a failure.

    Four different types of media with an increasing degree for reliability:

    1.)  Main Memory  –  is volatile storage that usually does not survive system crashes.

    2.)  Magnetic Disks –  provide online non-volatile storage. Compared with main memory, disks are morereliable and much cheaper, but slower by three to four order of magnitude.

    3.)  Magnetic Tape  –  is an offline non-volatile storage medium, which is far more reliable than disk andfairly inexpensive, but slower, providing only sequential access.

    4.)  Optical Disks  –  is more reliable than tape, generally cheaper, faster, providing random access.

    Additional Facts:

    -  Main memory is also called Primary Storage.-  Disks and tape are known as Secondary Storage.-  Stable storage represents information that has been replicated in several non-volatile storage

    media (usually disk) with independent failure modes.

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    23/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  23 

    Among the causes of failure are:

    1.)  System crashes  –  due to hardware or software errors, resulting in loss of main memory.

    2.)  Media failures  –  such as head crashes or unreadable media, resulting in the loss of parts ofsecondary storage.

    3.)  Application software errors  –  such as logical errors in the program that is accessing the database,which cause one or more transaction to fail.

    4.)  Natural physical disasters  –  such as fire, floods, earthquakes, or power failures.

    5.)  Sabotage  –  or can be called as intentional corruption or destruction of data, hardware or softwarefacilities.

    Two Principal Effects that we need to consider:

    1.)  The loss of main memory, including the database buffers.2.)  The loss of the disk copy of the database.

    A DBMS should provide the following facilities to assist with recovery:  

      A backup mechanism, which makes periodic backup copies of the database.

      Logging facilities, which keep track of the current state of transactions and database changes.

      A checkpoint facility, which enables updates to the database that are in progress to be made permanent.

      A recovery manager, which allows the system to restore the database to a consistent state following a

    failure.

    Log File

    To keep track of database transactions, the DBMS maintains a special file called a log

    (or journal) that contains information about all updates to the database.

    The log may contain the following data:

    1.)  Transaction Records, containing:

      Transaction identifier.  Type of log record.

      Identifier of data item affected by the database action.

    2.)  Checkpoint records

    Checkpoint –  the point of synchronization between the database and the

    transaction log file. All buffers are force-written to secondary storage.

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    24/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  24 

    Checkpoint are scheduled at predetermined intervals and involve the following

    operations: 

      Writing all log records in main memory to secondary storage.

      Writing the modified blocks in the database buffers to secondary storage.  Writing a checkpoint record to the log file. This record contains the identifiers of all transactions that

    are active at the time of the checkpoint.

    Recovery Techniques

    1.) Recovery techniques using deferred update 

      When a transaction starts, write a transaction start  record to the log.

      When any write operation is performed, write a log record containing all the data specified previously(excluding the before-image of the update). Do not actually write the update to the database buffers

    or the database itself.

      When a transaction is about to commit, write a transaction commit log record, write all the logrecords for the transaction to disk and then commit the transaction. Use the log records to perform the

    actual updates to the database.

      If a transaction aborts, ignore the log records for the transaction and do not perform the writes.

      Any transaction with transaction start  and transaction commit  log records should be redone.

      For any transactions with transaction start  and transaction abort  log records, we do nothing, since noactual writing was done to the database, so these transactions do not have to be undone.

    2.) Recovery techniques using immediate update 

      When a transaction starts, write a transaction start  record to the log.

      When a write operation is performed, write a record containing the necessary data to the log file.

      Once the log record is written, write the update to the database buffers.

      The updates to the database itself are written when the buffers are next flushed to secondarystorage.

      When a transaction commits, write a transaction commit  to the log.

    Lecture no. 2: IMPROVING QUERY PERFORMANCE

    2.1 Hash Files

    Hash function –  calculates the address of the page in which the record is to be stored based on one or more of the fields in the record.

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    25/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  25 

      Hash field  –  is also called as base field.

      Hash key  –  if the field is also a key field of the file.

    Collision  –  when the same address is generated for two or more records.

    There are several techniques that can be used to manage collisions:

    1.)  Open addressing2.)  Unchained overflow3.)  Chained overflow4.)  Multiple hashing

    2.2 Indexes

    Indexes –  a data structure that allows the DBMS to locate particular records in a file more quickly, and

    thereby speed response to user queries.

      Data file  –  the file containing the logical records.

      Index file  –  the file containing the index records.

      Primary index  –  if the data file is sequentially ordered, and the indexing field is a key field of

    the file, it is guaranteed to have a unique value in each record.

      Clustering index –  if the indexing is not a key field of the file, so that there can be more thanone record corresponding to a value of the indexing field.

      Secondary index  –  an index that is defined on a non-ordering field of the data file.

    Indexed sequential file –  a sorted data file with a primary index.

    An Indexed sequential file is a more versatile structure, which normally has:

      A primary storage area.

      A separate index or indexes.

      An overflow area.

    SEATWORK:

    1.) Define Transaction2.) Give other Database recovery techniques

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    26/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    MELJUN P. CORTES, mba,mpa,bscs,mscs in progress  26 

    QUIZ #5

    Lecture no. 3: DATA WAREHOUSING, OLAP and DATA MINING 

    3.1 Data Warehousing 

    Data Warehousing - is a subject-oriented, integrated, time-variant, and non-volatilecollection of data in support of management’s decision-making process.

    Benefits of Data Warehousing

      Potential high returns on investment

      Competitive advantage

      Increased productivity of corporate decision-makers

    Problems of Data Warehousing

      Underestimation of resources for data loading

      Hidden problems with source systems

      Required data not captured

      Increased end-user demands

      Data homogenization

      High demand for resources

      Data ownership

      High maintenance

      Long duration projects  Complexity of integration

    The Major components of a Data Warehouse

    1.  Operational Data

    The source of data for the data warehouse is supplied from:

      Mainframe operational held in first generation hierarchical and network databases.

      Departmental data held in propriety file systems such as VSAM, RMS, and relational DBMS

    such as Informix, Oracle.  Private data held on workstations and private servers.

      External systems such as the Internet, commercially available databases, or databases associatedwith an organization’s supplier or customers. 

    2.  Load Manager  –  (also called the front-end  component) performs all the operations associated withthe extraction and loading of data into the warehouse.

    3.  Warehouse Manager  –  performs all the operations associated with the management of the data inthe warehouse.

    4.  Query Manager - (also called the back-end  component) performs all the operations associated withthe management with the management of user queries.

    5.  End-user access tools  –  is to provide information to business users for strategic decision making.

  • 8/9/2019 MELJUN CORTES DATABASE System Instructional Manual

    27/27

      DATABASE FUNDAMENTAL

    2ND

     Semester 2014-2015 MELJUN P. CORTES 

    Can be categorized into five main groups:

      Reporting and query tools.

      Application development tools.

      Executive information system (EIS) tools.

      Online analytical processing (OLAP) tools.

      Data mining tools.

    3.2  Online Analytical Processing (OLAP)  –  the dynamic synthesis, analysis, and consolidation of largevolumes of multi-dimensional data.

    Rules for OLAP Systems

      Multi-dimensional conceptual view

      Transparency

      Accessibility

      Consistent reporting performance

      Client-server architecture

      Generic dimensionality

      Dynamic sparse matrix handling

      Multi-user support

      Unrestricted cross-dimensional operations

      Intuitive data manipulation

      Flexible reporting

      Unlimited dimensions and aggregations

    Categories of OLAP Tools

    1.  Multi-dimensional OPLAP ( MOLAP or MD-OPLAP )  –  use specialized data structures andMulti-dimensional Database Management (MDDBMSs) to organize, navigate, and analyze

    data.

    2.  Relational OPLAP ( ROLAP )  –  is the fastest-growing style of OLAP technology. ROLAPsupports RDBMS products through the use of a meta-data layer, thus avoiding the

    requirement to create a static multi-dimensional data structure.

    3.  Managed Query Environment (MQE)  –  they provide limited analysis capability, eitherdirectly against RDBMS products, or by using an intermediate MOLAP server.

    3.3  Data Mining

    Data Mining –  the process of extracting valid, previously unknown, comprehensible, and actionable

    information from large databases using it to make crucial business decisions.

    Four Main Operations Associated with Data Mining Techniques:

    1.  Predictive modeling 3. Link analysis2.  Database segmentation 4. Deviation detection

    QUIZ #6

    FINAL EXAMINATION