75
Information Technology for the Internet Age Databases and Data Warehouses Building Business Intelligence BUAD 358 - Information Systems II Ali A. Nazemi, Ph.D. Roanoke College

Information Technology for the Internet Age

Embed Size (px)

Citation preview

Information Technology for

the Internet Age

Databases and Data Warehouses

Building Business Intelligence

BUAD 358 - Information Systems II

Ali A. Nazemi, Ph.D.

Roanoke College

Data Base Management Systems

(Topics)

Definition & Hierarchy of Data

Components of a DBMS

Advantages and Disadvant. of DBMS

Data Base Types in Organizations

Queries and SQL

Data Base Design

Data Administration and Planning

Database Structures

Database Management

Systems

Database

DBMS

Programs

Data files

Without a DBMS:

Hard to find and share data.

With a DBMS:

Integrated data with shared access.

Strategy

Tactics

Operations

Strategy

Tactics

Operations

Company A Company B

DBMS & People

Data

Database ManagementSystem

Program Program

Business Operations

Ad Hoc Queriesand Reports

ProgrammerAnalyst

Database Administrator

Business Needs

Programs& Revisions

Managers/

Endusers

(Standards, Design & Control)

Data Collectionand TransactionProcessing

Business Intelligence

is knowledge about your:

Customers

Competitors

Partners

Competitive environment

Internal operations

Business Intelligence

Business Intelligence

Database Management

System Tools

Database Management Systems

Database management system (DBMS) – helps you specify the logical organization for a database and access and use the information within a database.

A DBMS contains the following five important software components: DBMS engine

Data definition subsystem

Data manipulation subsystem

Report & Form Generation Subsystem

Application generation subsystem

Data administration subsystem

The Relational Database Model

DBMS engine - accepts logical requests from the various other DBMS subsystems, converts them into their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device.

Physical view - deals with how information is physically arranged, stored, and accessed on some type of storage device such as a hard disk.

Logical view - focuses on how you arrange and access information to meet your particular business needs.

Database Engine

Creating Logical Structures

Data dictionary - contains the logical

structure for the information.

Primary key - a field (or group of fields in

some cases) that uniquely describes each

record.

Foreign key - a primary key of one file

that appears in another file.

Integrity constraints – rules that help

ensure the quality of the information.

Created Tables

Part Number is the primary key because of the key icon beside it.

For Percentage Markup, we defined its Format as “Percent” and its number of decimal places as 2.

Logical Ties Among Tables

Relational Databases

Tables Rows

Columns

Primary keys

Data types Text

Dates & times

Numbers

Yes/No

Objects

Phone Name Address City

312-555-1234 Jones 123 Main Chicago

502-555-8876 Smith 456 Oak Glasgow

602-555-9987 Juarez 887 Ribera Phoenix

612-555-4325 Olsen465 Thor Minneapolis

Customer Table

Customer Date Salesperson Total_sale

502-555-8876 3/3/95 2223 157.92

602-555-9987 4/4/94 8876 295.53

612-555-4325 4/9/96 8876 132.94

502-555-8876 5/7/95 3345 183.67

Orders Table

Data Manipulation Subsystem

Click here to enter a new record.

Find information using the binoculars.

DBMS ToolsData Manipulation Subsystem

Report generator - helps you quickly

define formats of reports and what

information you want to see in a report.

DBMS ToolsData Manipulation Subsystem

By following a series of simple screens, you can

easily create the report below.

DBMS Input ScreenText/Labels Data Variables

Scrolling Region/Subform

Command

Buttons

Record Selectors

- Subform

- Main

DBMS Report WriterReport header

Page header

Break/Group header

Detail

Footers

Sample Report with Groups

Designing Menus for Users

1. Setup Choices

2. Data Input

3. Print Reports

4. DOS Utilities

5. Backups

Main Menu

Daily Sales Reports

Friday Sales Meeting

Monthly Customer Letters

Quit

Customer Information

As a secretary, which menu is easier to understand?

Data Manipulation Subsystem

Queries

The QBE grid

Our selection criteria

Database Queries

Single Table

Computations

Joining Tables

Summary

1) What output do you want to see?

2) What tables are involved?

3) What do you already know? (constraints)

4) How are the tables joined?

Four questions to create a query

Single Table Query Introduction

CID Name Phone City AccountBalance

28764 Adamz 602-999-2539 Phoenix 197.54

87535 James 305-777-2235 Miami 255.93

44453 Kolke 303-888-8876 Denver 863.39

29587 Smitz 206-676-7763 Seattle 353.76

Sample Data

Access Query Screen (QBE)

Query: Which customers have balances greater than $200?

File: C05E15a.mdb

“AND” Conditions & SortingSample Data

Access Query Screen (QBE)

Query: Which Denver customers have balances greater than $200?

C# Name Phone City AccountBalance

28764 Adamz 602-999-2539 Phoenix 197.54

87535 James 305-777-2235 Miami 255.93

44453 Kolke 303-888-8876 Denver 863.39

29587 Smitz 206-676-7763 Seattle 353.76

Structured Query Language

Structured query language (SQL) - a standardized fourth-generation query language found in most DBMSs.

The SQL below creates the same report in Figure 3.7 on page 139. SELECT Part.[Part Number], Part.Cost, Employee.[Employee

Name], Employee.[Employee Number]

FROM Part, Employee

WHERE (((Part.Cost)>10));

SQL Examples

Query: Which customers have balances greater than $200?

SQL: SELECT CID, Name, Phone, City, AccountBalance

FROM Customers

WHERE AccountBalance > 200 ;

Query: Which Denver customers have balances greater than $200?

SQL: SELECT CID, City, AccountBalancel

FROM Customers

WHERE AccountBalance > 200 and City = “Denver”

ORDER BY Name ASC ;

CID Name Phone City AccountBalance

28764 Adamz 602-999-2539 Phoenix 197.54

87535 James 305-777-2235 Miami 255.93

44453 Kolke 303-888-8876 Denver 863.39

29587 Smitz 206-676-7763 Seattle 353.76

Useful WHERE Conditions

Comparisons <, =, >, <>, BETWEEN, Like

Numbers AccountBalance > 200

Text

Common Name > “Jones”

LIKE

Match all Name LIKE “J*”

Match one Name LIKE “?m*”

Dates Odate between #8/15/95# and #8/31/95#

Yes/No Discontinued = yes

Missing data City is NULL

NOT Name is NOT NULL

Use with QBE or SQL

SQL General Form

SELECT columns

FROM tables

JOIN link columns

WHERE conditions

GROUP BY column

ORDER BY column (ASC | DESC)

Computations

Sum

Avg

Min

Max

Count

StDev

Var

QBE

SELECT Count(C#), AVG(AccountBalance)

FROM Customers ;SQL

Groups or SubtotalsQBE

SELECT AVG(AccountBalance)

FROM Customers

GROUP BY City ;

SQL

City AVG(AccountBalance)

Chicago 197.54

Denver 863.39

Miami 255.93

Phoenix 526.76

Seattle 353.76

Sample Output

Multiple Tables

C# Name Phone City AccountBalance

12345 Jones 312-555-1234 Chicago $197.54

28764 Adams 602-999-2539 Phoenix $526.76

29587 Smitz 206-656-7763 Seattle $353.76

44453 Kolke 303-888-8876 Denver $863.39

87535 James 305-777-2235 Miami $255.98

Customers

S# Name DateHired Phone Commission

225 West 5/23/75 213-333-2345 5

452 Zeke 8/15/94 213-343-5553 3

554 Jabbar 7/15/91 213-534-8876 4

663 Bird 9/12/93 213-225-3335 4

887 Johnson 2/2/92 213-887-6635 4

Item# Description Price

1154 Corn Broom $1.00

2254 Blue Jeans $12.00

3342 Paper Towels--3 rolls $1.00

7653 Laundry Detergent $2.00

8763 Men's Boots $15.00

9987 Candy Popcorn $0.50

O# C# S# Odate Amount

117 12345 887 3/3/96 $57.92

125 87535 663 4/4/96 $123.54

157 12345 554 4/9/96 $297.89

169 29587 255 5/5/96 $89.93

178 44453 663 5/1/96 $154.89

188 29587 554 5/8/96 $325.46

201 12345 887 5/28/96 $193.58

211 44453 255 6/9/96 $201.39

213 44453 255 6/9/96 $154.15

215 87535 887 6/9/96 $563.27

280 28764 663 5/27/96 $255.32

O# Item# Quantity

117 1154 2

117 3342 1

117 7653 4

125 1154 4

125 8763 3

157 7653 2

169 3342 1

169 9987 5

178 2254 1

Salespeople

Items

Orders

ItemsSold

Linking Tables

The Orders to ItemsSold relationship enforces referential integrity.

One Order can list many ItemsSold.

Query Example

Which customers (cid) have placed

orders since June 1, 1996? QBE

SELECT CID

FROM Orders

WHERE Odate >= #6/1/96# ;

SQL

Results

CID Odate

44453 6/9/96

44453 6/9/96

87535 6/9/96

28764 6/27/96

Query ExampleWhat are the names of the customers who placed orders since June 1, 2004?

QBESELECT DISTINCT Name, Odate

FROM Orders

INNER JOIN Customers ON

Orders.CID = Customers.CID

WHERE Odate >= #6/1/2004# ;

SQL

Results

Name Odate

Adamz 6/27/2004

James 6/9/2004

Kolke 6/9/2004

QBE

SELECT DISTINCT Salespeople.Name,

Customers.Name

FROM Salespeople INNER JOIN (Customers INNER JOIN Orders ON

Customers.CID=Orders.CID) ON Salespeople.SID = Orders.SID

ORDER BY Salespeople.Name ;

SQL

Results

SalesName Cust.Name

Bird Adamz

Bird James

Bird Kolke

Jabbar Jones

Jabbar Smitz

Johnson James

Johnson Jones

West Kolke

West Smitz

Query ExampleList the salespeople (sorted alphabetically) along with the names of customers who placed orders with that salesperson.

Aggregation QueryWhat is the total amount of orders placed from customers who live in Miami?

QBE

SELECT SUM(Amount)

FROM Orders

INNER JOIN Customers ON Orders.CID = Customers.CID

WHERE City = “Miami” ;

SQL

Results

$2,418.84

DBMS ToolsData Administration Subsystem

Data administration subsystem - a DBMS helps you manage the overall database environment by providing facilities for backup and recovery, security management, query optimization, concurrency control, and change management.

On Your Own

DBMS Support

OLTP, OLAP, and

Information Management

(p. 142)

DBMS ToolsData Administration Subsystem

Backup and recovery facilities:

Periodically back up information contained in a

database.

Restart or recover a database and its information in

case of a failure.

Security management facilities - control who has

access to what information and what type of

access those people have.

Database Advantages

Focus on data Stable data

Programs change.

Data independence Change programs without altering data.

Data integrity Accuracy,Time,Concurrency, Security.

Ad hoc queries

Speed of development Report writers.

Input forms.

Data manipulation.

Flexibility & Queries

All Data Files

Database Management

System

Invoice

Program

Billing

Program

Database Design

Primary keys

One value per cell

Column depends on

whole key and

nothing but the key.

Customers

C# name city home business fax service

11 Jones Chicago 111-1111 222-2222 222-35534 876-3456

22 Smith Chicago 111-4567 444-5353

33 James Chicago 111-2567 222-8976

44 Ricci Chicago 333-8765

c# name city

11 Jones Chicago

22 Smith Chicago

33 James Chicago

44 Ricci Chicago

Customers(c#, name, city) c# phone_type number

11 home 111-1111

11 business 222-2222

11 fax 222-3534

11 service 876-3456

22 home 111-4587

22 service 444-5353

33 home 111-2567

44 fax 333-8765

Phones(c#, phone_type, number)

Database Design:

Normalization

File: C05Vid.mdb

Notation

Table

name

Primary key is

underlined

Table

columns

Customer (CustomerID, Phone, Name, Address, City, State, ZipCode)

CustomerID Phone LastName FirstName Address City State ZipCode

1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 42122

2 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 42101

3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171

4 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 42122

5 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 42102

6 615-373-4746 Steinmetz Susan 15 Speedway Drive Portland TN 37148

7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 37148

8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 37031

9 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 42721

10 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127

1st: RepeatingRentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) )

Repeating Section

Causes duplication

TransID RentDate CustomerID LastName Phone Address VideoID Copy# Title Rent

1 4/18/04 3 Washington 502-777-7575 95 Easy Street 1 2 2001: A Space Odyssey $1.50

1 4/18/04 3 Washington 502-777-7575 95 Easy Street 6 3 Clockwork Orange $1.50

2 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 8 1 Hopscotch $1.50

2 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 2 1 Apocalypse Now $2.00

2 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 6 1 Clockwork Orange $1.50

3 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 9 1 Luggage Of The Gods $2.50

3 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 15 1 Fabulous Baker Boys $2.00

3 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 4 1 Boy And His Dog $2.50

4 4/18/04 3 Washington 502-777-7575 95 Easy Street 3 1 Blues Brothers $2.00

4 4/18/04 3 Washington 502-777-7575 95 Easy Street 8 1 Hopscotch $1.50

4 4/18/04 3 Washington 502-777-7575 95 Easy Street 13 1 Surf Nazis Must Die $2.50

4 4/18/04 3 Washington 502-777-7575 95 Easy Street 17 1 Witches of Eastwick $2.00

First Normal

Customer Rentals

Name

Phone

Address

City

State

ZipCode

VideoID Copy# Title Rent

1. 6 1 Clockwork Orange 1.50

2. 8 2 Hopscotch 1.50

3.

4.

5.

{Unused Space}

Not in First Normal Form

1st: SplitRentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) )

RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)

RentalLine(TransID, VideoID, Copy#, Title, Rent )

TransID RentDate CustomerID Phone LastName FirstName Address City State ZipCode

1 4/18/04 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171

2 4/30/04 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 37148

3 4/18/04 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 37031

4 4/18/04 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171

RentalForm2

TransID VideoID Copy# Title Rent

1 1 2 2001: A Space Odyssey $1.50

1 6 3 Clockwork Orange $1.50

2 8 1 Hopscotch $1.50

2 2 1 Apocalypse Now $2.00

2 6 1 Clockwork Orange $1.50

3 9 1 Luggage Of The Gods $2.50

3 15 1 Fabulous Baker Boys $2.00

3 4 1 Boy And His Dog $2.50

4 3 1 Blues Brothers $2.00

4 8 1 Hopscotch $1.50

4 13 1 Surf Nazis Must Die $2.50

4 17 1 Witches of Eastwick $2.00

RentalLineNote: replication

Note: replication

2nd Split

RentalLine(TransID, VideoID, Copy#, Title, Rent )

VideosRented(TransID, VideoID, Copy# ) Videos(VideoID, Title, Rent )

TransID VideoID Copy#

1 1 2

1 6 3

2 2 1

2 6 1

2 8 1

3 4 1

3 9 1

3 15 1

4 3 1

4 8 1

4 13 1

4 17 1

VideoID Title Rent

1 2001: A Space Odyssey $1.50

2 Apocalypse Now $2.00

3 Blues Brothers $2.00

4 Boy And His Dog $2.50

5 Brother From Another Planet $2.00

6 Clockwork Orange $1.50

7 Gods Must Be Crazy $2.00

8 Hopscotch $1.50

Column depends on entire (whole) key.

3rd SplitRentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode )

Rentals(TransID, RentDate, CustomerID )

Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )

TransID RentDate CustomerID

1 4/18/04 3

2 4/30/04 7

3 4/18/04 8

4 4/18/04 3

CustomerID Phone LastName FirstName Address City State ZipCode

1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 42122

2 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 42101

3 502-777-7575 WashingtonElroy 95 Easy Street Smith's Grove KY 42171

4 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 42122

5 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 42102

6 615-373-4746 Steinmetz Susan 15 Speedway Drive Portland TN 37148

7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 37148

8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 37031

9 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 42721

10 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127

Rentals

Customers

3NF Tables

Database Administration

Database Administrator

Testing

Backup

Recovery

Standards

Access Controls

Database versus

SpreadsheetTables

Customers(C#, Name,Address)

Products(P#, Description, Price)

Sales(O#, P#, Sdate, Quantity, C#)

Part 1 Sales

C# P# Q Price Price*Q SubTotal

11 22 1 15.95 15.95 15.95

11 35 2 5.75 11.50 27.45

31 18 1 25.95 25.95 53.40

Part 2 Products

P# Description Prices

18 shorts 25.95

22 shirt 15.95

35 laces 4.75

Part 3 Customers

C# Name

11 Smith

31 Torrez

Retrieve the three tables (if they fit).

1) Select by date

2) Sort By O#, P#

3) Look up prices

4) Put into Part 1

5) Calculate total

6) Sort for highest total

7) Look up names

Storage v calculation

Multiple tables

DBMS versus SpreadsheetDBMS

SELECT Sum(Price*Quantity), C#,Name

FROM Customers INNER JOIN

(Sales INNER JOIN Products

ON Sales.P# = Products.P#)

ON Customers.C# = Sales.C#

WHERE Sdate > Now() - 30

GROUP BY C#

ORDER BY Sum(Price*Quantity) DESC;

Objects

Hypertext & Massive text

Pictures & Graphs

Objects

Video

Sound

User defined

Sample OO Database

Patient X-Rays/Images ID

Date

Technician

Comments

Patient Data ID

Name Address DoB

Medical History

photo

Patient Visits

ID Date

Physician

Problems Comments

Patient Treatments

ID Date Procedure Doctor

Massive Text

Commercial Databases

CD-ROM

Search Strategies

Keywords

Narrow it down

Boolean searches

Weights (Verity Topic)

Smart searchers

Sample Boolean searches:

Colombia

Terrorism

Medellin and (terrorist or

terrorism or bombing or kidnap)

E-Business Databases

E-business is transaction-based

Databases support multiple users and

protect transactions

Modern websites are driven by databases

E-Business Databases

Internet

Customer

Web Server

Web program script

<HTML>

Text

<%

Database connection

%>

Order Form

Descriptions

Prices

User Advantages of DBMS

Sharing Common Data Resources

Enforcement of Standards

Ease of Application Development

End-User Computing Capabilities

Data Accessibility

Organizational Advantages

of DBMS

Minimize Data Redundancy

Program-Data Independence

Data Consistency & Integrity

Reduced Data Maintenance

Data Integration

Disadvantages of DBMS

Need for Specialized Skill

Need for Explicit Backup and Recovery

Interference with Shared Data

Organizational Resistance

Lack of Availability of Resources

Time-Consuming Development

DBMS ToolsData Administration Subsystem

Query optimization facilities - take queries from users and restructure them to minimize response times.

Reorganization facilities - continually maintain statistics concerning how the DBMS engine physically accesses information.

Concurrency control facilities - ensure the validity of database updates when multiple users attempt to access and change the same information.

Data Warehouses and

Data MiningWhat Is a Data Warehouse?

Data warehouse - a logical collection of

information – gathered from many different

operational databases – used to create

business intelligence that supports

business analysis activities and decision-

making tasks.

Data Warehouses and

Data MiningWhat Is a Data Warehouse?

Data Warehouses and

Data MiningWhat Is a Data Warehouse?

Data warehouses are not transaction-

oriented.

Data warehouses support online analytical

processing (OLAP).

Data Warehouses and

Data MiningWhat Are Data Mining Tools?

Data mining tools - software tools you use to query information in a data warehouse. These tools include:

Query-and-reporting tools - similar to QBE tools, SQL, and report generators in the typical database environment.

Intelligent agents – use various artificial intelligence tools to form the basis of information discovery and building business intelligence in OLAP.

Data Warehouses and

Data MiningWhat Are Data Mining Tools?

Data mining tools continued

Multidimensional analysis (MDA) tools -slice-and-dice techniques that allow you to view multidimensional information from different perspectives.

Statistical tools – help you apply various mathematical models to the information stored in a data warehouse to discover new information.

Data Warehouses and

Data MiningWhat Are Data Mining Tools?

Data Warehouses and

Data MiningData Marts – Smaller Data WarehousesData mart - a subset of a data warehouse in which only a focused portion of the data warehouse information is kept.

Data Warehouses and

Data MiningImportant ConsiderationsDo you need a data warehouse?

Do all your employees need an entire data warehouse?

How up-to-date must the information be?

What data mining tools do you need?

Team Work

How Up-to-Date

Should Data

Warehouse

Information Be?

(p. 149)

MANAGING THE

INFORMATION RESOURCEWho Should Oversee the Organization’s

Information?Chief information officer (CIO) - responsible for overseeing an organization’s information resource.

Data administration - plans for, oversees the development of, and monitors the information resource.

Database administration - responsible for the more technical and operational aspects of managing the information contained in organizational databases.

MANAGING THE

INFORMATION RESOURCEHow Will Changes in Technology Affect

Organizing and Managing Information?As new technologies become available, you should ask yourself whether those technologies will help you organize and manage your information better.

One of the greatest technological changes that will occur over the coming years is a convergence of different tools that will help you better organize and manage information.

MANAGING THE

INFORMATION RESOURCEIs Information Ownership a Consideration?

Information ownership is a key consideration in today’s information-based business environment.

Ownership refers to who is responsible for information quality.

On Your Own

CRUD – Defining

Information Ownership

(p. 151)

MANAGING THE

INFORMATION RESOURCE What Are the Ethics Involved in Managing

and Organizing Information?

Databases, data warehouses, DBMSs, and data

mining tools make it possible for people to easily

access all kinds of organizational information.

How does an organization safeguard against the

unethical use of information within the

organization?

Closing Case Study OneWe’ve Got OLTP Covered; Let’s Go on to

OLAPWhat is the single most important factor that hinders all organizations in general from providing good online analytical processing (OLAP) support?

Why is it so much easier for organizations to provide good online transaction processing (OLTP) support?

Closing Case Study TwoMining Dining Data

Consider the issue of timely information with respect to the businesses discussed in the case.

Which of the businesses must have the most up-to-date information in its data warehouse?