Upload
susanna88
View
2.528
Download
4
Embed Size (px)
DESCRIPTION
THE CRM DATA WAREHOUSEA large reservoir of detailed and summary data that describes the organization and its activities, organized by the various business dimensions in a way to facilitate easy retrieval of information describing activitiesdata mart –a subset of the data warehouse, tailored to meet the specialized needs of a particular group of usersTop-down approachbottom-up approach to data warehouse development—the data marts are created first and then integrated.
Citation preview
CHAPTER 4
THE CRM DATA WAREHOUSE
WHAT IS A DATA WAREHOUSE?
A large reservoir of detailed and summary data that describes the organization and its activities, organized by the various business dimensions in a way to facilitate easy retrieval of information describing activities
data mart –a subset of the data warehouse, tailored to meet the specialized needs of a particular group of users
Top-down approach bottom-up approach to data warehouse
development—the data marts are created first and then integrated.
Data Warehousing Objectives
(1) keep the warehouse data current;
(2) ensure that the warehouse data is accurate;
(3) keep the warehouse data secure;
(4) make the warehouse data easily available to authorized users;
(5) maintain descriptions of the warehouse data so that users and system developers can understand the meaning of each element
Data Warehouse vs. DBMS OLTP (on-line transaction processing)
Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll,
registration, accounting, etc.
OLAP (on-line analytical processing) Major task of data warehouse system Data analysis and decision making
Distinct features (OLTP vs. OLAP): User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated Database design: ER + application vs. star + subject View: current, local vs. evolutionary, integrated Access patterns: update vs. read-only but complex queries
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date detailed, flat relational isolated
historical, summarized, multidimensional integrated, consolidated
usage repetitive ad-hoc
access read/write index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
DATA WAREHOUSE ARCHITECTURE
staging area— data is prepared to be moved into the warehouse data repository and the metadata repository
metadata— data about data, or descriptions of the data in the data warehouse
Exhibit 4.1: A Data Warehouse System Model
EXHIBIT 4.1 A DATA WAREHOUSE SYSTEM MODEL
Data gathering system
Staging area
Warehousedata
repository
Information Deliverysystem
Management and control
Metadatarepository
Data Warehouse System
Legend :
Data flow
Control flow
A Data Warehouse System Model
Management and Control management and control component—like a traffic
officer standing in the middle of a street intersection, controlling the flow of traffic through the intersection
Staging Area ETL— extraction, transformation, and loading as the
activities of this staging area extraction— obtaining data from the internal databases
and files of systems, accomplished according to a schedule
transformation— a process that includes cleaning, standardizing, reformatting, and summarizing
loading— writing the data into the data warehouse
A Data Warehouse System Model
WAREHOUSE DATA REPOSITORY where the warehouse data is stored within the computer system
or systems Data Content
customer picture—a compilation of geographic, demographic, activity, psychographic, and behavioral data
Data Characteristics the types of data to be processed, including considerations of d
ata granularity, data hierarchies, and data dimensions Data Types
fixed-length format variable-length format
A Data Warehouse System Model
Data Granularity the degree of detail that is represented by the data, where the
greater the detail, the finer the granularity Data Hierarchies
since multiple attributes can describe a single entity, an attribute is a data element that identifies or describes an occurrence of a data entity (i.e., a particular customer is identified by a customer number attribute)
Exhibit 4.2: An Example of a Data Hierarchy Data Dimensions
for example, a manager can query the data warehouse for a display of data according to salesperson, customer, product, and time
Exhibit 4.3: Every Data Record Contains the Time Element
EXHIBIT 4.2 AN EXAMPLE OF A DATA HIERARCHY
Customer
Customer number
Customer age
Customer gender
Customer marital status
Customer number of dependents
Customer education
Customer dwelling type
Customer state
Customer city
Customer zip code
EXHIBIT 4.3 EVERY DATA RECORD CONTAINS THE TIME ELEMENT
Warehouse shipping order
Sales order date
Statementdate
Date shipped
CustomerPayment
date
Invoice date
Customer sales order
Customer payment
Customer statement
Customer invoice
A Data Warehouse System Model METADATA REPOSITORY
describes the flow of data from the time that the data is captured until it is archived, i.e., metadata in the metadata repository for the customer number attribute would describe its format, editing rules, and so on
TYPES OF METADATA Metadata for Users
(analysis) identification of the source systems, the time of the last update, the different report formats that are available, and how to find data in the data warehouse
Metadata for Systems Developers data to allow developers to maintain, revise, and reengineer the
data warehouse system, including the various rules that were employed in creating the warehouse data repository, and the rules for extraction, cleansing, transforming, purging, and archiving
A Data Warehouse System Model
Data and Process Models object diagrams and entity-relationship diagrams use cases, use case diagrams, and data flow
diagrams CASE Tools
stands for computer-aided system engineering and is a way to use the computer to develop systems
DBMS Systems include a data dictionary component, which contains
excellent descriptions of the data in the database or data warehouse.
HOW DATA IS STORED IN THE DATA WAREHOUSE
dimension table— a list of all of the attributes that identify and describe a particular entity
Exhibit 4.4: A Sample Dimension Tablefact table— a list of all the facts that relate
to some type of the organization’s activityExhibit 4.5: A Sample Fact Table
EXHIBIT 4.4 A SAMPLE DIMENSION TABLE
Customer Customer numberCustomer nameCustomer phone numberCustomer e-mail addressCustomer territoryCustomer credit codeCustomer standard industry codeCustomer cityCustomer stateCustomer zip code
EXHIBIT 4.5 A SAMPLE FACT TABLE
Commercial Sales FactsActual sales unitsBudgeted sales unitsActual sales amountBudgeted sales amountSales discount amountNet sales amountSales commission amountSales bonus amountSales tax amount
INFORMATION PACKAGES
a table that is maintained in the data warehouse repository that identifies both the dimensions and the facts that relate to a business activity
Exhibit 4.6: Information Package Format key—a number, such as a customer number,
that identifies a particular occurrence of the dimension
Exhibit 4.7: A Sample Information Package
EXHIBIT 4.6 INFORMATION PACKAGE FORMAT
Subject : Name of business activity being measured
Dimension Name Dimension Name Dimension Name Dimension Name
Dimension Key Dimension Key Dimension Key Dimension Key
Dimension 1 Dimension 1 Dimension 1 Dimension 1
Dimension 2 Dimension 2 Dimension 2 Dimension 2
Dimension 3 Dimension 3 Dimension n Dimension 3
Dimension 4 Dimension n Dimension 4
Dimension n Dimension n
Facts : Numberic measures of the business activity
EXHIBIT 4.7 A SAMPLE INFORMATION PACKAGE
Subject : Commercial salesTime Salesperson Customer Product
Time Key Salesperson key Customer key Product key
Hour Salesperson name Customer name Product name
Day Sales branch Customer territory Product model
Month Sales region Customer credit code Product brand
Quarter Subsidiary Product line
Year
Facts : Actual sales units, budgeted sales units, actual sales amount, budgeted sales amount, sales discount amount, net sales amount, sales commission amount, sales bonus amount, sales tax amount
STAR SCHEMAS
the arrangement of an information package that usually identifies multiple dimension tables for a single fact table and has the appearance of a star, with the fact table in the center and the dimension tables forming the points
Exhibit 4.8: Star Schema Format foreign keys— a means of linking the fact table to the
dimension tables by means of the keys identified at the top of the fact table where the keys identify other, “foreign” tables as opposed to the fact table
Exhibit 4.9: A Sample Star Schema
EXHIBIT 4.8 STAR SCHEMA FORMAT
Dimension 1 name
Dimension 2 name
Dimension n name
Dimension 1 key
Dimension 1 hierarchy
Dimension 2 key
Dimension 2hierarchy
Dimension 1 keyDimension 2 keyDimension n key
Measurable fact 2Measurable fact 4Measurable fact 5Measurable fact n
Dimension n key
Dimension n hierarchy
Business activity name
EXHIBIT 4.9 A SAMPLE STAR SCHEMA
Customer keyCustomer nameCustomer type Customer credit codeSalesperson numberSales territoryStandard industry code
Product keyCustomer keySalesperson key Time key
Sales units Gross sales amountSales discount amountNet sales amountSales commission amount
Salesperson keySalesperson nameSales regionSales branch
Product keyProduct nameProduct unit priceProduct quantity
Time keyDayMonthQuarterYear
Customer
Customer payment
Product sales facts
Time
Salesperson
Example of Star Schema
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcityprovince_or_streetcountry
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_keyitem_namebrandtypesupplier_type
item
branch_keybranch_namebranch_type
branch
Example of Snowflake Schema
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcity_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_keyitem_namebrandtypesupplier_key
item
branch_keybranch_namebranch_type
branch
supplier_keysupplier_type
supplier
city_keycityprovince_or_streetcountry
city
Example of Fact Constellation
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcityprovince_or_streetcountry
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_keyitem_namebrandtypesupplier_type
item
branch_keybranch_namebranch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_keyshipper_namelocation_keyshipper_type
shipper
DATA WAREHOUSE NAVIGATION
summary information— preprocessed data that provides the user with exactly the content that is needed
top-down navigation— the user seeks more detail in an effort to understand the summary information
roll up navigation— the user summarizes data to “see the forest rather than the trees” or to prepare summary graphs
drill across navigation— the user moves from one data hierarchy to another, i.e., information on customer sales, salesperson sales, and then product sales
Exhibit 4.10: Navigation Paths
EXHIBIT 4.10 NAVIGATION PATHS
Summary information(Net sales for the Western sales region)
Hierarchy 1(customer)
Hierarchy 2(salesperson)
Hierarchy n(product)
Detailed information (Net sales for salesperson 3742)
Detailed data (Sales units for salesperson 3742)
Roll up
Drill across
Drill down
Drill through
DATA WAREHOUSE SECURITY information systems security— damage, destruction, theft, and
misuse Exhibit 4.11: The Security Action Cycle The Corporate Security Environment deterrence— security policies and procedures that are intended to
deter security violations, such as guidelines for proper system use and the requirement that users change their passwords periodically
prevention— measures aimed at those persons who ignore deterrence, and include such things as locks on computer rooms, user passwords, file permissions
detection— proactive actions include system audits, reports of suspicious activity, and virus scanning software and reactive actions take the form of investigations
remedies— respond with warnings, reprimands, termination of employment, or legal action.
EXHIBIT 4.11 THE SECURITY ACTION CYCLE
1.Deterrence
2.Prevention
3.Detection
4.Remedies
Maximize Deterred abuse
Maximize Prevented abuse
Maximize Undetected abuse
Maximize Unpunished abuse
Deterred abuse
Prevented abuse
Undetected abuse
Unpunished abuse
Deterrence feedback
DATA WAREHOUSE SECURITY
Data Warehouse Security Measures network security— using procedures such as firewalls to
restrict access to the network that houses the servers and data files, databases, data warehouses, and data marts
data security— obtaining access to data once access to the network has been achieved; where, data files may be located on multiple servers on the network, and the user must provide a second password
database or data warehouse security— the security checks of the database management system (DBMS) that may include a third password, verification of user name, and also verification of access to particular data tables, records, and even record fields