Upload
databaseguys
View
111
Download
4
Tags:
Embed Size (px)
Citation preview
Information Technology for
the Internet Age
Databases and Data Warehouses
Building Business Intelligence
BUAD 358 - Information Systems II
Ali A. Nazemi, Ph.D.
Roanoke College
Data Base Management Systems
(Topics)
Definition & Hierarchy of Data
Components of a DBMS
Advantages and Disadvant. of DBMS
Data Base Types in Organizations
Queries and SQL
Data Base Design
Data Administration and Planning
Database Structures
Database Management
Systems
Database
DBMS
Programs
Data files
Without a DBMS:
Hard to find and share data.
With a DBMS:
Integrated data with shared access.
Strategy
Tactics
Operations
Strategy
Tactics
Operations
Company A Company B
DBMS & People
Data
Database ManagementSystem
Program Program
Business Operations
Ad Hoc Queriesand Reports
ProgrammerAnalyst
Database Administrator
Business Needs
Programs& Revisions
Managers/
Endusers
(Standards, Design & Control)
Data Collectionand TransactionProcessing
Business Intelligence
is knowledge about your:
Customers
Competitors
Partners
Competitive environment
Internal operations
Business Intelligence
Database Management Systems
Database management system (DBMS) – helps you specify the logical organization for a database and access and use the information within a database.
A DBMS contains the following five important software components: DBMS engine
Data definition subsystem
Data manipulation subsystem
Report & Form Generation Subsystem
Application generation subsystem
Data administration subsystem
DBMS engine - accepts logical requests from the various other DBMS subsystems, converts them into their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device.
Physical view - deals with how information is physically arranged, stored, and accessed on some type of storage device such as a hard disk.
Logical view - focuses on how you arrange and access information to meet your particular business needs.
Database Engine
Creating Logical Structures
Data dictionary - contains the logical
structure for the information.
Primary key - a field (or group of fields in
some cases) that uniquely describes each
record.
Foreign key - a primary key of one file
that appears in another file.
Integrity constraints – rules that help
ensure the quality of the information.
Created Tables
Part Number is the primary key because of the key icon beside it.
For Percentage Markup, we defined its Format as “Percent” and its number of decimal places as 2.
Relational Databases
Tables Rows
Columns
Primary keys
Data types Text
Dates & times
Numbers
Yes/No
Objects
Phone Name Address City
312-555-1234 Jones 123 Main Chicago
502-555-8876 Smith 456 Oak Glasgow
602-555-9987 Juarez 887 Ribera Phoenix
612-555-4325 Olsen465 Thor Minneapolis
Customer Table
Customer Date Salesperson Total_sale
502-555-8876 3/3/95 2223 157.92
602-555-9987 4/4/94 8876 295.53
612-555-4325 4/9/96 8876 132.94
502-555-8876 5/7/95 3345 183.67
Orders Table
Data Manipulation Subsystem
Click here to enter a new record.
Find information using the binoculars.
DBMS ToolsData Manipulation Subsystem
Report generator - helps you quickly
define formats of reports and what
information you want to see in a report.
DBMS ToolsData Manipulation Subsystem
By following a series of simple screens, you can
easily create the report below.
DBMS Input ScreenText/Labels Data Variables
Scrolling Region/Subform
Command
Buttons
Record Selectors
- Subform
- Main
Designing Menus for Users
1. Setup Choices
2. Data Input
3. Print Reports
4. DOS Utilities
5. Backups
Main Menu
Daily Sales Reports
Friday Sales Meeting
Monthly Customer Letters
Quit
Customer Information
As a secretary, which menu is easier to understand?
Database Queries
Single Table
Computations
Joining Tables
Summary
1) What output do you want to see?
2) What tables are involved?
3) What do you already know? (constraints)
4) How are the tables joined?
Four questions to create a query
Single Table Query Introduction
CID Name Phone City AccountBalance
28764 Adamz 602-999-2539 Phoenix 197.54
87535 James 305-777-2235 Miami 255.93
44453 Kolke 303-888-8876 Denver 863.39
29587 Smitz 206-676-7763 Seattle 353.76
Sample Data
Access Query Screen (QBE)
Query: Which customers have balances greater than $200?
File: C05E15a.mdb
“AND” Conditions & SortingSample Data
Access Query Screen (QBE)
Query: Which Denver customers have balances greater than $200?
C# Name Phone City AccountBalance
28764 Adamz 602-999-2539 Phoenix 197.54
87535 James 305-777-2235 Miami 255.93
44453 Kolke 303-888-8876 Denver 863.39
29587 Smitz 206-676-7763 Seattle 353.76
Structured Query Language
Structured query language (SQL) - a standardized fourth-generation query language found in most DBMSs.
The SQL below creates the same report in Figure 3.7 on page 139. SELECT Part.[Part Number], Part.Cost, Employee.[Employee
Name], Employee.[Employee Number]
FROM Part, Employee
WHERE (((Part.Cost)>10));
SQL Examples
Query: Which customers have balances greater than $200?
SQL: SELECT CID, Name, Phone, City, AccountBalance
FROM Customers
WHERE AccountBalance > 200 ;
Query: Which Denver customers have balances greater than $200?
SQL: SELECT CID, City, AccountBalancel
FROM Customers
WHERE AccountBalance > 200 and City = “Denver”
ORDER BY Name ASC ;
CID Name Phone City AccountBalance
28764 Adamz 602-999-2539 Phoenix 197.54
87535 James 305-777-2235 Miami 255.93
44453 Kolke 303-888-8876 Denver 863.39
29587 Smitz 206-676-7763 Seattle 353.76
Useful WHERE Conditions
Comparisons <, =, >, <>, BETWEEN, Like
Numbers AccountBalance > 200
Text
Common Name > “Jones”
LIKE
Match all Name LIKE “J*”
Match one Name LIKE “?m*”
Dates Odate between #8/15/95# and #8/31/95#
Yes/No Discontinued = yes
Missing data City is NULL
NOT Name is NOT NULL
Use with QBE or SQL
SQL General Form
SELECT columns
FROM tables
JOIN link columns
WHERE conditions
GROUP BY column
ORDER BY column (ASC | DESC)
Computations
Sum
Avg
Min
Max
Count
StDev
Var
QBE
SELECT Count(C#), AVG(AccountBalance)
FROM Customers ;SQL
Groups or SubtotalsQBE
SELECT AVG(AccountBalance)
FROM Customers
GROUP BY City ;
SQL
City AVG(AccountBalance)
Chicago 197.54
Denver 863.39
Miami 255.93
Phoenix 526.76
Seattle 353.76
Sample Output
Multiple Tables
C# Name Phone City AccountBalance
12345 Jones 312-555-1234 Chicago $197.54
28764 Adams 602-999-2539 Phoenix $526.76
29587 Smitz 206-656-7763 Seattle $353.76
44453 Kolke 303-888-8876 Denver $863.39
87535 James 305-777-2235 Miami $255.98
Customers
S# Name DateHired Phone Commission
225 West 5/23/75 213-333-2345 5
452 Zeke 8/15/94 213-343-5553 3
554 Jabbar 7/15/91 213-534-8876 4
663 Bird 9/12/93 213-225-3335 4
887 Johnson 2/2/92 213-887-6635 4
Item# Description Price
1154 Corn Broom $1.00
2254 Blue Jeans $12.00
3342 Paper Towels--3 rolls $1.00
7653 Laundry Detergent $2.00
8763 Men's Boots $15.00
9987 Candy Popcorn $0.50
O# C# S# Odate Amount
117 12345 887 3/3/96 $57.92
125 87535 663 4/4/96 $123.54
157 12345 554 4/9/96 $297.89
169 29587 255 5/5/96 $89.93
178 44453 663 5/1/96 $154.89
188 29587 554 5/8/96 $325.46
201 12345 887 5/28/96 $193.58
211 44453 255 6/9/96 $201.39
213 44453 255 6/9/96 $154.15
215 87535 887 6/9/96 $563.27
280 28764 663 5/27/96 $255.32
O# Item# Quantity
117 1154 2
117 3342 1
117 7653 4
125 1154 4
125 8763 3
157 7653 2
169 3342 1
169 9987 5
178 2254 1
Salespeople
Items
Orders
ItemsSold
Linking Tables
The Orders to ItemsSold relationship enforces referential integrity.
One Order can list many ItemsSold.
Query Example
Which customers (cid) have placed
orders since June 1, 1996? QBE
SELECT CID
FROM Orders
WHERE Odate >= #6/1/96# ;
SQL
Results
CID Odate
44453 6/9/96
44453 6/9/96
87535 6/9/96
28764 6/27/96
Query ExampleWhat are the names of the customers who placed orders since June 1, 2004?
QBESELECT DISTINCT Name, Odate
FROM Orders
INNER JOIN Customers ON
Orders.CID = Customers.CID
WHERE Odate >= #6/1/2004# ;
SQL
Results
Name Odate
Adamz 6/27/2004
James 6/9/2004
Kolke 6/9/2004
QBE
SELECT DISTINCT Salespeople.Name,
Customers.Name
FROM Salespeople INNER JOIN (Customers INNER JOIN Orders ON
Customers.CID=Orders.CID) ON Salespeople.SID = Orders.SID
ORDER BY Salespeople.Name ;
SQL
Results
SalesName Cust.Name
Bird Adamz
Bird James
Bird Kolke
Jabbar Jones
Jabbar Smitz
Johnson James
Johnson Jones
West Kolke
West Smitz
Query ExampleList the salespeople (sorted alphabetically) along with the names of customers who placed orders with that salesperson.
Aggregation QueryWhat is the total amount of orders placed from customers who live in Miami?
QBE
SELECT SUM(Amount)
FROM Orders
INNER JOIN Customers ON Orders.CID = Customers.CID
WHERE City = “Miami” ;
SQL
Results
$2,418.84
DBMS ToolsData Administration Subsystem
Data administration subsystem - a DBMS helps you manage the overall database environment by providing facilities for backup and recovery, security management, query optimization, concurrency control, and change management.
On Your Own
DBMS Support
OLTP, OLAP, and
Information Management
(p. 142)
DBMS ToolsData Administration Subsystem
Backup and recovery facilities:
Periodically back up information contained in a
database.
Restart or recover a database and its information in
case of a failure.
Security management facilities - control who has
access to what information and what type of
access those people have.
Database Advantages
Focus on data Stable data
Programs change.
Data independence Change programs without altering data.
Data integrity Accuracy,Time,Concurrency, Security.
Ad hoc queries
Speed of development Report writers.
Input forms.
Data manipulation.
Flexibility & Queries
All Data Files
Database Management
System
Invoice
Program
Billing
Program
Database Design
Primary keys
One value per cell
Column depends on
whole key and
nothing but the key.
Customers
C# name city home business fax service
11 Jones Chicago 111-1111 222-2222 222-35534 876-3456
22 Smith Chicago 111-4567 444-5353
33 James Chicago 111-2567 222-8976
44 Ricci Chicago 333-8765
c# name city
11 Jones Chicago
22 Smith Chicago
33 James Chicago
44 Ricci Chicago
Customers(c#, name, city) c# phone_type number
11 home 111-1111
11 business 222-2222
11 fax 222-3534
11 service 876-3456
22 home 111-4587
22 service 444-5353
33 home 111-2567
44 fax 333-8765
Phones(c#, phone_type, number)
Notation
Table
name
Primary key is
underlined
Table
columns
Customer (CustomerID, Phone, Name, Address, City, State, ZipCode)
CustomerID Phone LastName FirstName Address City State ZipCode
1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 42122
2 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 42101
3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171
4 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 42122
5 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 42102
6 615-373-4746 Steinmetz Susan 15 Speedway Drive Portland TN 37148
7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 37148
8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 37031
9 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 42721
10 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127
1st: RepeatingRentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) )
Repeating Section
Causes duplication
TransID RentDate CustomerID LastName Phone Address VideoID Copy# Title Rent
1 4/18/04 3 Washington 502-777-7575 95 Easy Street 1 2 2001: A Space Odyssey $1.50
1 4/18/04 3 Washington 502-777-7575 95 Easy Street 6 3 Clockwork Orange $1.50
2 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 8 1 Hopscotch $1.50
2 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 2 1 Apocalypse Now $2.00
2 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 6 1 Clockwork Orange $1.50
3 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 9 1 Luggage Of The Gods $2.50
3 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 15 1 Fabulous Baker Boys $2.00
3 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 4 1 Boy And His Dog $2.50
4 4/18/04 3 Washington 502-777-7575 95 Easy Street 3 1 Blues Brothers $2.00
4 4/18/04 3 Washington 502-777-7575 95 Easy Street 8 1 Hopscotch $1.50
4 4/18/04 3 Washington 502-777-7575 95 Easy Street 13 1 Surf Nazis Must Die $2.50
4 4/18/04 3 Washington 502-777-7575 95 Easy Street 17 1 Witches of Eastwick $2.00
First Normal
Customer Rentals
Name
Phone
Address
City
State
ZipCode
VideoID Copy# Title Rent
1. 6 1 Clockwork Orange 1.50
2. 8 2 Hopscotch 1.50
3.
4.
5.
{Unused Space}
Not in First Normal Form
1st: SplitRentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) )
RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)
RentalLine(TransID, VideoID, Copy#, Title, Rent )
TransID RentDate CustomerID Phone LastName FirstName Address City State ZipCode
1 4/18/04 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171
2 4/30/04 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 37148
3 4/18/04 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 37031
4 4/18/04 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171
RentalForm2
TransID VideoID Copy# Title Rent
1 1 2 2001: A Space Odyssey $1.50
1 6 3 Clockwork Orange $1.50
2 8 1 Hopscotch $1.50
2 2 1 Apocalypse Now $2.00
2 6 1 Clockwork Orange $1.50
3 9 1 Luggage Of The Gods $2.50
3 15 1 Fabulous Baker Boys $2.00
3 4 1 Boy And His Dog $2.50
4 3 1 Blues Brothers $2.00
4 8 1 Hopscotch $1.50
4 13 1 Surf Nazis Must Die $2.50
4 17 1 Witches of Eastwick $2.00
RentalLineNote: replication
Note: replication
2nd Split
RentalLine(TransID, VideoID, Copy#, Title, Rent )
VideosRented(TransID, VideoID, Copy# ) Videos(VideoID, Title, Rent )
TransID VideoID Copy#
1 1 2
1 6 3
2 2 1
2 6 1
2 8 1
3 4 1
3 9 1
3 15 1
4 3 1
4 8 1
4 13 1
4 17 1
VideoID Title Rent
1 2001: A Space Odyssey $1.50
2 Apocalypse Now $2.00
3 Blues Brothers $2.00
4 Boy And His Dog $2.50
5 Brother From Another Planet $2.00
6 Clockwork Orange $1.50
7 Gods Must Be Crazy $2.00
8 Hopscotch $1.50
Column depends on entire (whole) key.
3rd SplitRentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode )
Rentals(TransID, RentDate, CustomerID )
Customers(CustomerID, Phone, Name, Address, City, State, ZipCode )
TransID RentDate CustomerID
1 4/18/04 3
2 4/30/04 7
3 4/18/04 8
4 4/18/04 3
CustomerID Phone LastName FirstName Address City State ZipCode
1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 42122
2 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 42101
3 502-777-7575 WashingtonElroy 95 Easy Street Smith's Grove KY 42171
4 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 42122
5 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 42102
6 615-373-4746 Steinmetz Susan 15 Speedway Drive Portland TN 37148
7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 37148
8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 37031
9 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 42721
10 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127
Rentals
Customers
Database versus
SpreadsheetTables
Customers(C#, Name,Address)
Products(P#, Description, Price)
Sales(O#, P#, Sdate, Quantity, C#)
Part 1 Sales
C# P# Q Price Price*Q SubTotal
11 22 1 15.95 15.95 15.95
11 35 2 5.75 11.50 27.45
31 18 1 25.95 25.95 53.40
Part 2 Products
P# Description Prices
18 shorts 25.95
22 shirt 15.95
35 laces 4.75
Part 3 Customers
C# Name
11 Smith
31 Torrez
Retrieve the three tables (if they fit).
1) Select by date
2) Sort By O#, P#
3) Look up prices
4) Put into Part 1
5) Calculate total
6) Sort for highest total
7) Look up names
Storage v calculation
Multiple tables
DBMS versus SpreadsheetDBMS
SELECT Sum(Price*Quantity), C#,Name
FROM Customers INNER JOIN
(Sales INNER JOIN Products
ON Sales.P# = Products.P#)
ON Customers.C# = Sales.C#
WHERE Sdate > Now() - 30
GROUP BY C#
ORDER BY Sum(Price*Quantity) DESC;
Sample OO Database
Patient X-Rays/Images ID
Date
Technician
Comments
Patient Data ID
Name Address DoB
Medical History
photo
Patient Visits
ID Date
Physician
Problems Comments
Patient Treatments
ID Date Procedure Doctor
Massive Text
Commercial Databases
CD-ROM
Search Strategies
Keywords
Narrow it down
Boolean searches
Weights (Verity Topic)
Smart searchers
Sample Boolean searches:
Colombia
Terrorism
Medellin and (terrorist or
terrorism or bombing or kidnap)
E-Business Databases
E-business is transaction-based
Databases support multiple users and
protect transactions
Modern websites are driven by databases
E-Business Databases
Internet
Customer
Web Server
Web program script
<HTML>
Text
<%
Database connection
%>
Order Form
Descriptions
Prices
User Advantages of DBMS
Sharing Common Data Resources
Enforcement of Standards
Ease of Application Development
End-User Computing Capabilities
Data Accessibility
Organizational Advantages
of DBMS
Minimize Data Redundancy
Program-Data Independence
Data Consistency & Integrity
Reduced Data Maintenance
Data Integration
Disadvantages of DBMS
Need for Specialized Skill
Need for Explicit Backup and Recovery
Interference with Shared Data
Organizational Resistance
Lack of Availability of Resources
Time-Consuming Development
DBMS ToolsData Administration Subsystem
Query optimization facilities - take queries from users and restructure them to minimize response times.
Reorganization facilities - continually maintain statistics concerning how the DBMS engine physically accesses information.
Concurrency control facilities - ensure the validity of database updates when multiple users attempt to access and change the same information.
Data Warehouses and
Data MiningWhat Is a Data Warehouse?
Data warehouse - a logical collection of
information – gathered from many different
operational databases – used to create
business intelligence that supports
business analysis activities and decision-
making tasks.
Data Warehouses and
Data MiningWhat Is a Data Warehouse?
Data warehouses are not transaction-
oriented.
Data warehouses support online analytical
processing (OLAP).
Data Warehouses and
Data MiningWhat Are Data Mining Tools?
Data mining tools - software tools you use to query information in a data warehouse. These tools include:
Query-and-reporting tools - similar to QBE tools, SQL, and report generators in the typical database environment.
Intelligent agents – use various artificial intelligence tools to form the basis of information discovery and building business intelligence in OLAP.
Data Warehouses and
Data MiningWhat Are Data Mining Tools?
Data mining tools continued
Multidimensional analysis (MDA) tools -slice-and-dice techniques that allow you to view multidimensional information from different perspectives.
Statistical tools – help you apply various mathematical models to the information stored in a data warehouse to discover new information.
Data Warehouses and
Data MiningData Marts – Smaller Data WarehousesData mart - a subset of a data warehouse in which only a focused portion of the data warehouse information is kept.
Data Warehouses and
Data MiningImportant ConsiderationsDo you need a data warehouse?
Do all your employees need an entire data warehouse?
How up-to-date must the information be?
What data mining tools do you need?
Team Work
How Up-to-Date
Should Data
Warehouse
Information Be?
(p. 149)
MANAGING THE
INFORMATION RESOURCEWho Should Oversee the Organization’s
Information?Chief information officer (CIO) - responsible for overseeing an organization’s information resource.
Data administration - plans for, oversees the development of, and monitors the information resource.
Database administration - responsible for the more technical and operational aspects of managing the information contained in organizational databases.
MANAGING THE
INFORMATION RESOURCEHow Will Changes in Technology Affect
Organizing and Managing Information?As new technologies become available, you should ask yourself whether those technologies will help you organize and manage your information better.
One of the greatest technological changes that will occur over the coming years is a convergence of different tools that will help you better organize and manage information.
MANAGING THE
INFORMATION RESOURCEIs Information Ownership a Consideration?
Information ownership is a key consideration in today’s information-based business environment.
Ownership refers to who is responsible for information quality.
On Your Own
CRUD – Defining
Information Ownership
(p. 151)
MANAGING THE
INFORMATION RESOURCE What Are the Ethics Involved in Managing
and Organizing Information?
Databases, data warehouses, DBMSs, and data
mining tools make it possible for people to easily
access all kinds of organizational information.
How does an organization safeguard against the
unethical use of information within the
organization?
Closing Case Study OneWe’ve Got OLTP Covered; Let’s Go on to
OLAPWhat is the single most important factor that hinders all organizations in general from providing good online analytical processing (OLAP) support?
Why is it so much easier for organizations to provide good online transaction processing (OLTP) support?