25
Data Warehouse Data Warehouse Development Approaches Development Approaches 1 1

Data Warehouse Development Approach

Embed Size (px)

Citation preview

Page 1: Data Warehouse Development Approach

Data Warehouse Data Warehouse Development Approaches Development Approaches

11

Page 2: Data Warehouse Development Approach

Fundamental QuestionsFundamental Questions

Before deciding to build a data warehouse for your organization, you need to ask the following basic and fundamental questions and address the relevant issues:Top-down or bottom-up approach?Enterprise-wide or departmental?Which first—data warehouse or data mart?Build pilot or go with a full-fledged implementation?Dependent or independent data marts?

2

Page 3: Data Warehouse Development Approach

3

Data Warehouse Data Warehouse Development ApproachesDevelopment Approaches

Data warehouse development approaches

◦ Inmon Model: EDW approach ◦ Kimball Model: Data mart approach

Which model is better?◦ There is no one-size-fits-all strategy to

data warehousing ◦ One alternative is the hosted warehouse

Page 4: Data Warehouse Development Approach

General Data Warehouse General Data Warehouse Development ApproachesDevelopment Approaches

“Big bang” approach

Incremental approach:◦Top-down incremental approach◦Bottom-up incremental approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 4

Page 5: Data Warehouse Development Approach

““Big Bang” ApproachBig Bang” Approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 5

Analyze enterpriserequirements

Build enterprisedata warehouse

Report in subsets orstore in data marts

Page 6: Data Warehouse Development Approach

Incremental Approach Incremental Approach to Warehouse Developmentto Warehouse Development Multiple iterations Shorter implementations Validation of each phase

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 6

Strategy

Definition

Analysis

Design

Build

Production

Increment 1

Iterative

Page 7: Data Warehouse Development Approach

Top-Down ApproachTop-Down Approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 7

Analyze requirements at the enterprise level

Develop conceptual information model

Identify and prioritize subject areas

Complete a model of selected subject area

Map to available data

Perform a source system analysis

Implement base technical architecture

Establish metadata, extraction, and load processes for the initial subject area

Create and populate the initial subject area data mart within the overall warehouse

framework

Page 8: Data Warehouse Development Approach

Top downTop down

The advantages of this approach are:

A truly corporate effort, an enterprise view of data Inherently architected—not a union of disparate data marts Single, central storage of data about the content Centralized rules and control May see quick results if implemented with iterations

The disadvantages are: Takes longer to build even with an iterative method High exposure/risk to failure Needs high level of cross-functional skills High outlay without proof of concept

8

Page 9: Data Warehouse Development Approach

Bottom-Up ApproachBottom-Up Approach

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 9

Define the scope and coverage of the data warehouse and analyze the source systems within this scope

Define the initial increment based on the political pressure, assumed business benefit and data volume

Implement base technical architecture and establish metadata, extraction, and load processes as required by increment

Create and populate the initial subject areas within the overall warehouse framework

Page 10: Data Warehouse Development Approach

Bottom-UpBottom-Up

The advantages of this approach are:

Faster and easier implementation of manageable pieces Favorable return on investment and proof of concept Less risk of failure Inherently incremental; can schedule important data marts

first Allows project team to learn and grow

The disadvantages are: Each data mart has its own narrow view of data Permeates redundant data in every data mart Perpetuates inconsistent and irreconcilable data Proliferates unmanageable interfaces

10

Page 11: Data Warehouse Development Approach

Dimensional Modeling Dimensional Modeling ProcessProcessHigh level dimensional model design

◦ Choosing business model◦ Declaring the grain◦ Choosing dimensions◦ Identifying the facts

Detailed dimensional model developmentDimensional model review and validation

◦ IS◦ Core users◦ Business community

Final design iteration

ISQS 6339, Data Mgmt & BI, Zhangxi Lin 11

Page 12: Data Warehouse Development Approach

Supplemental Slides : Supplemental Slides : Data Warehouse Design Data Warehouse Design Phases Phases

12

Page 13: Data Warehouse Development Approach

Defining the Business Defining the Business RequirementsRequirements The concept of business dimensions is fundamental to

the requirements definition for a data warehouse.

13

Page 14: Data Warehouse Development Approach

Information packageInformation packageYour primary goal in the requirements definition phase is to compile information packagesOnce you have firmed up the information packages, you’ll be able to proceed to the other phases.Essentially, information packages enable you to:

◦ Define the common subject areas

◦ Design key business metrics

◦ Decide how data must be presented

◦ Determine how users will aggregate or roll up

◦ Decide the data quantity for user analysis or query

◦ Decide how data will be accessed

14

Page 15: Data Warehouse Development Approach

15

Page 16: Data Warehouse Development Approach

16

Page 17: Data Warehouse Development Approach

Supplemental Slides : Supplemental Slides : The Others The Others

17

Page 18: Data Warehouse Development Approach

18

Snowflake Schema ModelSnowflake Schema Model

◦Direct use by some tools◦More flexible to change◦Provides for speedier data loading◦Can become large and unmanageable◦Degrades query performance◦More complex metadata

18

Country State County City

Page 19: Data Warehouse Development Approach

Degenerate DimensionsDegenerate Dimensions

order_number and order_line in the fact table

For example, you may be looking for average number of products per order. Then you will have to relate the products to the order number to calculate the average.

Attributes such as order_number and order_line in the example are called degenerate dimensions and these are kept as attributes of the fact table.

19

Page 20: Data Warehouse Development Approach

20

Storage and Performance Storage and Performance ConsiderationsConsiderations

Database sizingData partitioningIndexingStar query optimization

20

Page 21: Data Warehouse Development Approach

21

Database Sizing - Test Load Database Sizing - Test Load SamplingSampling

Analyze a representative sample of the data chosen using proven statistical methods.

Ensure that the sample reflects:◦Test loads for different periods◦Day-to-day operations◦Seasonal data and worst-case scenarios◦ Indexes and summaries

21

Page 22: Data Warehouse Development Approach

22

Data PartitioningData Partitioning

Breaking up of data into separate physicalunits that can be handled independently

Types of data partitioning ◦ Horizontal partitioning. ◦ Vertical partitioning

22

Page 23: Data Warehouse Development Approach

23

IndexingIndexing

Indexing is used for the following reasons:◦ It is a huge cost saving, greatly

improving performance and scalability.

◦ It can replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed.

23

Page 24: Data Warehouse Development Approach

24

ParallelismParallelism

24

Parallel Execution Servers

Sales table

Customerstable

P3

P3

P1

P1

P2

P2

Page 25: Data Warehouse Development Approach

25

Using Summary DataUsing Summary Data

Designing summary tables offers the following benefits:◦Provides fast access to precomputed data◦Reduces use of I/O, CPU, and memory

25