35
© 1999 FORWISS Data Warehouse: Data Warehouse: Methodology and Tools Methodology and Tools FORWISS - Bavarian Research Centre for Knowledge Based Systems Concepts, Architectures and Products

© 1999 FORWISS Data Warehouse: Methodology and Tools FORWISS - Bavarian Research Centre for Knowledge Based Systems Concepts, Architectures and Products

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

© 1999 FORWISS

Data Warehouse:Data Warehouse: Methodology and Tools Methodology and Tools

FORWISS - Bavarian Research Centre for Knowledge Based Systems

Concepts, Architectures and Products

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

2

OverviewOverview

The Process of Planning and Building a Warehouse

Data Warehouse Architecture (revisited)

Classification of Tools

Focus: OLAP Tools– Multidimensional Data Modeling

– OLAP Architectures

– OLAP query languages

Tool Demonstration

Summary

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

3

OLAP Design CycleOLAP Design Cycle

RequirementAnalysis

Conceptual Design(Implementation

Independent)

Using the DataWarehouse

Logical + Physical Design(e.g. Product specific)

Implementation

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

4

OperationalData Sources

Data-Migration Middleware (Populations-Tools)

DataStorage

RepositoryRepository

DataAnalysis

Reporting, OLAP,Data Mining

Data Warehouse ArchitectureData Warehouse Architecture

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

5

Classification of ToolsClassification of Tools

Frontend Tools

Data Storage Tools (Databases)

ETL Tools– Extraction

– Transformation

– Loading

Repository Systems– Metadata Storage

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

6

Repository SystemsRepository Systems

Manage Different Kinds of Metadata

– Business Metadata

» E.g. How is revenue computed

– Technical Metadata

» When was data last loaded from which system

» Data model for OLTP and OLAP databases

Functionality

– Communication ‚hub‘ for different tools

– Guides user exploration

– Guides development process

– Impact analysis

E.g. Viasoft Rochade, Softlab Enabler,...

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

7

ETL ToolsETL Tools

Extraction: Range of Supported Data Sources

– Mainframe legacy databases

– COBOL Files

– Relational Databases

– Filebased data storage (Excel, Word,XML,...)

Transformation

– (Graphical) Specification of Transformation Rules (Expressive Power)

Loading

– Ability to use database features (e.g. bulk loading)

Process Management

– Scheduling, Monitoring, Error Handling

Informatica PowerMart, Hummingbird Genio, Acta...

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

8

Databases for DWDatabases for DW

Special Indexing Techniques– Multidimensional Indexes

– Bitmap Indexes

– Foreign Column Indexes

Support for Materialized Views (Preaggregation)

Special Analytical Capabilities (e.g. SQL Extensions)– Top N

– Ranking

Bulk Loading Capabilities– Offline, No concurrency control

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

9

ReportingReporting

Interactive OLAPInteractive OLAP

Ad hoc-Queries

Data MiningData Mining

Additional BenefitAdditional Benefit Number of UsersNumber of Users

What happened?

What happened?

Why did it happen?Why did

it happen?

What will happen?What will happen?

What happened why and how?

What happened why and how?

Frontend ToolsFrontend Tools

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

10

The User‘s view (OLAP Tool)The User‘s view (OLAP Tool)

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

11

Multidimensional OLAP (MOLAP)Multidimensional OLAP (MOLAP)

Multidim.Database

Frontend Tool

specialized database technology

multidimensional storage structures

E.g. Hyperion Essbase, Oracle Express, Cognos

PowerPlay (Server)

+ Query Performance

+ Powerful MD Model

+ write access

Database Features multiuser access/ backup and recovery

Sparsity Handling -> DB Explosion

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

12

Relational OLAP (ROLAP)Relational OLAP (ROLAP)

ROLAP- Engine

Relational DB

Frontend Tool

SQL

MD-Interface

Meta Data

idea: use relational data storage

star (snowflake) schema

E.g. Microstrategy, SAP BW

+ advantages of RDBMS+ scalability, reliability, security etc.

+ Sparsity handling

Query Performance

Data Model Complexity

no write access

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

13

Client (Desktop) OLAPClient (Desktop) OLAP

Client-OLAP

proprietary data structure on the client

data stored as file

mostly RAM based architectures

E.g. Business Objects, Cognos PowerPlay

+ mobile user

+ ease of installation and use

data volume

no multiuser capabilites

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

14

ROLAP- Engine

Multidim.Database

DW-DB (mostly relational)

MOLAP ROLAP Client-OLAP

DW IntegrationDW Integration

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

15

Combining Architectures ICombining Architectures I

Multidim.Database

Drill through

highly aggregated data

dense data

95% of the analysis requirements

highly aggregated data

dense data

95% of the analysis requirements

detailed data (sparse)

5% of the requirements

detailed data (sparse)

5% of the requirements

RelationalDatabase

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

16

Combining Architectures IICombining Architectures II

Multidim. Storage

Hybrid OLAP (HOLAP)

HOLAP System

Relational Storage

Meta Data

equal treatment of MD and Rel Data

Storage type at the discretion of the administrator

Cube Partitioning

equal treatment of MD and Rel Data

Storage type at the discretion of the administrator

Cube Partitioning

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

17

OLAP StandardsOLAP Standards

Idea: define interface between client and server

Benefit: Component oriented architectures

Proposal 1: OLAP Council– union of OLAP Tool producers

– not implemented so far (even by the council members)

Proposal 2: Microsoft - OLEDB for OLAP (shot ODBO)– standardizes a data model and an MD query language (MDX)

– specification contains lots of optional functionality

– all major vendors committed themselves to the standard

– will be the de facto standard

© 1999 FORWISS

Practical Case StudyPractical Case Study

Building a Warehouse

artwork copyright Intersystems GmbHartwork copyright Intersystems GmbH

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

19

Conceptual DesignConceptual Design

RequirementAnalysis

Conceptual Design(Implementation

Independent)

Using the DataWarehouse

Logical + Physical Design(tool specific)

Implementation

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

20

The Modeling ProcessThe Modeling Process

Which business process is being modeled?

What is the subject of analysis (fact) and what is being

measured?

On what granularity level is active analysis being done?

Which properties (dimensions) determine the measures?

Which different levels of aggregation are meaningful?

What additional information is needed for the different levels?

What is the variability and the cardinality of the dimensions?

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

21

FactsFacts

Fact = Subject of Analysis Sales

Measures = Attributes describing facts Quantity, Price

Derived Measures Profit

Additivity of Measures globally additiv Quantity additiv for some dimensions Items in stock

additiv resp. to plants/

not additiv w.r.t. time

not additiv at all profit margin

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

22

DimensionsDimensions

Dimensions = static structure of business information

Used for navigating the data space

Choosing the necessary granularity

Dimension Members = Instances of a dimension– e.g. 8.12.1997 and Juli 1997 are members of dimension “time”

Structuring Dimension– using different dimension levels (hierarchies)

– using descriptive attributes

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

23

Simple HierarchiesSimple Hierarchies

Januar 99Januar 99

Februar 99Februar 99

März 99März 99

Sept. 99Sept. 99

August 99August 99

Juli 99Juli 99

April 99April 99

Juni 99Juni 99

Mai 99Mai 99

1. Quartal 991. Quartal 99

2. Quartal 992. Quartal 99

3. Quartal 993. Quartal 99

19991999

2. Halbjahr 992. Halbjahr 99

1. Halbjahr 991. Halbjahr 99

........................

MonthMonth QuarterQuarter1/2 YearPeriod

1/2 YearPeriod YearYear

Dimension Level

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

24

Unbalanced HierarchiesUnbalanced Hierarchies

......

Plant 1Plant 1

Bu 1Bu 1

Bu 2Bu 2

GreatOutdoors

GreatOutdoors

Div BDiv B

Div ADiv A

......

Plant 0815Plant 0815

Plant/SitePlant/Site BusinessUnit

BusinessUnit

BusinessDivision

BusinessDivision EnterpriseEnterprise

......

Plant1Plant1

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

25

Alternative HierarchiesAlternative Hierarchies

Customer 04Customer 04

Customer 01Customer 01

PartnerPartner

RetailerRetailer

GermanyGermany

Customer 06Customer 06

ConsumerConsumer

Customer 02Customer 02

Customer 03Customer 03

Customer 05Customer 05

BavariaBavaria

HessenHessen

HamburgHamburg

CustomerCustomer Geogr. RegionGeogr. Region CountryCountry

Customer GroupCustomer Group

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

26

Alternative PathesAlternative Pathes

Würzburg 01Würzburg 01

Munich 01Munich 01

GermanyGermany

Frankfurt 01Frankfurt 01Germany (South)Germany (South)

Germany (West)Germany (West)

Germany (North)Germany (North)

Munich 02Munich 02

Munich 03Munich 03

Würzburg 02Würzburg 02

BayernBayern

HessenHessen

HamburgHamburg

OrtOrt Geogr. RegionGeogr. Region

Sales RegionSales Region

CountryCountry

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

27

Criteria for a ‘good’ MD DesignCriteria for a ‘good’ MD Design

dimensions should be independent

dimensionality of a cube should be max. 7-8 dimensions– interpretation of results is difficult for a large number of dimensions

hierarchies should have a fan-out of max. 30– long drill-down times

– large drill-down results

– insert additional levels for structuring purposes (e.g. insert state

between city and country)

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

28

Graphical Notation (ME/R)Graphical Notation (ME/R)

A Fact and its measuresA Fact and its measures

Level-NameAttribute 1

...Attribute n

Fact-Name

Measure 1

Measure n...

.. is characterized by dimensions.. is characterized by dimensions

..can be classified according to.....can be classified according to...

A Dimension Level with attributesA Dimension Level with attributes

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

29

Sale

Revenue

Order QtyCost

Prod. Type

Line

Year

Month

Example Data ModelExample Data Model

CustomerType

Branch

Region

Country

Product

Sales RepNameCode

Day

Customer MarginRange

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

30

Cognos PowerPlay- ArchitectureCognos PowerPlay- ArchitectureS

erve

rC

li ent

PowerPlay Client

PowerCube(Proprietary,

Compressed)

Transformer

Impromptu

PowerPlay Server

OLEDB for OLAP

OLEDB Providere.g. MS OLAP Services,SAP BW,…

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

31

Logical+Physical DesignLogical+Physical Design

RequirementAnalysis

Conceptual Design(Implementation

Independent)

Using the DataWarehouse

Logical + Physical Design(tool specific)

Implementation

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

32

Practical DemonstrationPractical Demonstration

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

33

Summary and ConclusionsSummary and Conclusions

Multidimensional modeling is performed on different levels– conceptual model (tool independent level) following requirement analysis

– logical and physical design before implementation

Distinction between two types of data– quantifying data: measures, cells of the cube, fact table

– qualifying data: properties, dimensions, dimension tables

Hierarchical structures of dimensions can be complex

ME/R notation can be used to document conceptual models

Several ways to map an MD model to a relational DB

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

34

Result Granularity

A

B

Canonical Query (I)Canonical Query (I)Restriction Element

A

B

Query Result

Result Measures

m2m1

© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects

35

Canonical Query (II)Canonical Query (II)

Result Measures

Result Granularity

Restriction Elements

g1 g2 gn…

r1 r2 rn…

m1 … mk

Canonical Query Definition

SELECT g1,...,gn, aggr(m1),..., aggr(mk)FROM FactName, Dim1,..., Dimn

WHERE Dim1.level(r1) = r1 AND ... AND Dimn.level(rn) = rn

AND Dim1.d1=FactName.d1 AND ... AND Dimn.dn=FactName.dn

GROUP BY g1,...,gn