Upload
calder
View
28
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Chapter 2: DATA WAREHOUSING. FUNDAMENTALS of DATABASE SYSTEMS , Fifth Edition. Who are my customers and what products are they buying?. Which customers are most likely to go to the competition ? . Introduction. What product promotions have the biggest impact on revenue?. - PowerPoint PPT Presentation
Citation preview
Chapter 2:DATA WAREHOUSING
FUNDAMENTALS of DATABASE SYSTEMS, Fifth Edition
1Fundamentals of Database Systems, Fifth Edition
Introduction
Who are my customers and what products are they buying?
Which customers are most likely to go to the competition ?
What impact will new products/services
have on revenue and margins?
What product promotions have the biggest
impact on revenue?
2Fundamentals of Database Systems, Fifth EditionFundamentals of Database Systems, Fifth Edition
Introduction (cont.) There is a great need for tools that provide decision
makers with information to make decisions quickly and reliably based on historical data.
The above functionality is achieved by data warehousing
it characterized by subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions.
Fundamentals of Database Systems, Fifth Edition 3
Introduction (cont.) online analytical processing (OLAP)
A term used to describe the analysis of complex data from the data warehouse.
and data mining. The process of knowledge discovery
4Fundamentals of Database Systems, Fifth EditionFundamentals of Database Systems, Fifth Edition
Characteristics of Data Warehouses- Subject oriented Organized around major subjects, such as product,
sales. Focusing on the modelling and analysis of data for
decision makers, not on daily operations or transaction processing.
Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision process.
Fundamentals of Database Systems, Fifth Edition 5
Characteristics of Data Warehouses- integrated Constructed by integrating multiple, heterogeneous
data sources.
Data cleaning and data integration techniques are applied.
Fundamentals of Database Systems, Fifth Edition 6
Characteristics of Data Warehouses- Time Variant Data warehouse data : provide information from a
historical perspective (e.g., past 5-10 years)
Every data in the data warehouse contains an element of time.
Fundamentals of Database Systems, Fifth Edition 7
Characteristics of Data Warehouses- Non Volatile Operational update of data doesn’t occur in the
data warehouse environment.
Doesn't require transaction processing, recovery, and concurrency control mechanism.
Require only two operations in data accessing Initial loading of data and quering.
Fundamentals of Database Systems, Fifth Edition 8
Data Warehouse vs. operational databases
DW Traditional DB Large amount of data from multiple sources that may include different DB models or files acquired from independent systems and platforms.
It is a transactional (relational, object-oriented ,network ,hierarchical)
Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Optimizes for retrieval.
Focusing on daily operations or transaction processing Optimizes for routine transaction processing
Provide information from a historical perspective (e.g., past 5-10 years).
Current value data.
It is nonvolatile. In traditional DB ,transactions are the agent of change to the database.
Supports DSS, Data Mining and OLAP. Supports OLTP.
Fundamentals of Database Systems, Fifth Edition 9
OLTP vs. OLAP
OLTP OLAP
User Clerk, IT Professional. Decision-makers, analysts.
Function Day to day operations. Decision support.
DB Design Application-oriented (E-R based)
Subject-oriented (Star, snowflake)
Data Current. Historical.
View Detailed. Summarized.
Access Read/write. Read Mostly.
# Records accessed
Tens. Millions.
#Users Thousands. Hundreds.
Db size 100 MB-GB. 100GB-TB.
Fundamentals of Database Systems, Fifth Edition 10
What is a Data Warehouse?A Practitioners Viewpoint“A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”
Barry Devlin, IBM Consultant
Fundamentals of Database Systems, Fifth Edition 11
What is a Data Warehouse?
Fundamentals of Database Systems, Fifth Edition 12
Data source in Chicago
Data source in New York
Data source in Taranto
CleanIntegrateTransformLoadRefresh
Data warehouse
Query and analysis
tools
client
client
3-D data cube
Fundamentals of Database Systems, Fifth Edition 13
Measures Dimension Produc
ts Dim
ensio
nQ4
Q3
Q2
Tim
e D
imen
sion
Apples
CherriesGrapes
Q1
Melons
Example of Querying a Cube
AveUnits
Sales Dollars
SalesUnits
Net Price
1000
From table and spreadsheet to data cubes A data warehouse is based on a multidimensional data
model which views data in the form of data cube.
A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions. Dimension tables contains descriptions about the
subject of the business. such as item (item_name, brand, type) or time (day,
week, month, quarter, year)
Fundamentals of Database Systems, Fifth Edition 15
From table and spreadsheet to data cubes (cont.)
Fact table contain a factual or quantitative data Fact table also contains measures (such as
dollars_sold) and keys to each of the related dimension tables.
Fundamentals of Database Systems, Fifth Edition 16
4-D Data cube
Fundamentals of Database Systems, Fifth Edition 17
Cube: a lattice of cuboids
Fundamentals of Database Systems, Fifth Edition 18
0-D (apex) cuboids
1-D cuboids
2-D cuboids
3-D cuboids
4-D (base) cuboids
Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions &
measures Star schema: a fact table in the middle connected
to a set of dimension tables.
Snowflake schema: a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension table, forming a shape similar to snowflake.
Fundamentals of Database Systems, Fifth Edition 19
Conceptual Modeling of Data Warehouses (cont.) Fact constellations: multiple fact tables share
dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation
Fundamentals of Database Systems, Fifth Edition 20
Example of Star Schematime
Time_key
Day
Day_of_the_week
Month
Quarter
year
Fundamentals of Database Systems, Fifth Edition 21
branchbranch_key
Branch_name
Branch_type
itemitem_key
Item_name
brand
type
Supplier_type
locationlocation_key
street
city
State_or_province
country
Sales Fact Table
Time_key
Item_key
Branch_key
Location_key
Units_sold
Dollars_sold
Avg_sales
Measures
Example of Snowflake Schema
timeTime_key
Day
Day_of_the_week
Month
Quarter
year
Fundamentals of Database Systems, Fifth Edition 22
branchbranch_key
Branch_name
Branch_type
itemitem_key
Item_name
brand
type
Supplier_type
locationlocation_key
street
City_key
Sales Fact Table
Time_key
Item_key
Branch_key
Location_key
Units_sold
Dollars_sold
Avg_sales
Measures
SupplierSupplier_key
Supplier_type
cityCity_key
city
State_or_province
country
Example of Fact Constellationtime
Time_key
Day
Day_of_the_week
Month
Quarter
year
Fundamentals of Database Systems, Fifth Edition 23
branchbranch_key
Branch_name
Branch_type
itemitem_key
Item_name
brand
type
Supplier_type
locationlocation_key
street
City_key
Sales Fact Table
Time_key
Item_key
Branch_key
Location_key
Units_sold
Dollars_sold
Avg_sales
Measures
Shipping fact table
Time_key
Item_key
Shipper_key
From_location
To_location
Dollars_cost
Units_shipped
shipperShipper_key
Cshipper_name
Location_key
Shipper_type
Cube definition syntax in DMQL
Fundamentals of Database Systems, Fifth Edition 24
Defining star schema in DMQL
Fundamentals of Database Systems, Fifth Edition 25
Defining snowflake in DMQL
Fundamentals of Database Systems, Fifth Edition 26
Defining fact constellation in DMQL
Fundamentals of Database Systems, Fifth Edition 27
Measure of Data Cube: three categories Distributive: if the result derived by applying the
function to n aggregated values is the same as that derived by applying the function on all the data without portioning. E.g., count(), min()
Fundamentals of Database Systems, Fifth Edition 28
Measure of Data Cube: three categories (cont.) Algebraic: if it can be computed by an algebraic
function with M arguments ( where M is abounded integer), each of which is obtained by applying a distributive aggregated function E.g., avg()
Holistic: if there is no constant bound on the storage size needed to describe a sub aggregate Mode(), rank()
Fundamentals of Database Systems, Fifth Edition 29
Typical OLAP operations Roll up ( drill-up) summarize data.
By climbing up hierarchy
Drill down ( roll down): reverse of roll-up From higher level summary to lower level summary or
detailed data.
Slice and dice: project and select
Fundamentals of Database Systems, Fifth Edition 30
Typical OLAP operations (cont.) Pivot ( rotate)
Reorient the cub, visualization, 3D to series of 2D planes
Other operations: Drill across: involving ( across) more than one fact
table Drill through: through the bottom level of the cube
to its back-end relational tables (using sql)
Fundamentals of Database Systems, Fifth Edition 31
Design of Data Warehouse: A Business Analysis Framework Four views regarding the design of data warehouse
Top down view: allow selection of the relevant information necessary for the data warehouse
Data source view: exposes the information being captured, stored, and managed by operational systems
Data warehouse view: consists of the fact table and dimension table
Fundamentals of Database Systems, Fifth Edition 32
Design of Data Warehouse: A Business Analysis Framework (cont.)
Business query view: see perspectives of data in the warehouse from the view of end-user
Fundamentals of Database Systems, Fifth Edition 33
Data Warehouse Design Process Top-down, bottom-up approaches or combination
of both Top-down: starts with overall design and planning Bottom-up: starts with experiments and prototypes
From software engineering point of view Waterfall: structure and systematic analysis at each
step before proceeding to next.
Fundamentals of Database Systems, Fifth Edition 34
Data Warehouse Design Process (cont.)
Spiral : rapid generation of increasingly function systems, quick turn around.
Fundamentals of Database Systems, Fifth Edition 35
Data Warehouse Design Process (cont.) Typical data warehouse design process:
Choose a business process to model. E.g., orders, invoice, etc
Choose the grain (atomic level of data) of the business process
Choose the dimension that will apply to each fact table record
Choose measure that will populate each fact table record
Fundamentals of Database Systems, Fifth Edition 36
Three Data Warehouse Models Enterprise warehouse
Collect all of the organization about subjects spanning the entire organization
Data Mart: A subset of corporate- wide data that is of value to
specific group of users.
Virtual warehouse Set of views over operational databases
Fundamentals of Database Systems, Fifth Edition 37
Data Warehouse Back-End Tools and Utilities Data extraction
Data cleaning
Data transformation
Load
refresh
Fundamentals of Database Systems, Fifth Edition 38