32
1 DATA WAREHOUSE DESIGN DATA WAREHOUSE DESIGN ICDE 2001 Tutorial ICDE 2001 Tutorial Stefano Rizzi, Matteo Golfarelli DEIS - University of Bologna, Italy 2 Motivation Motivation Building a data warehouse for an enterprise is a huge and complex task, which requires an accurate planning aimed at devising satisfactory answers to organizational and architectural questions. Despite the pushing demand for working solutions coming from enterprises and the wide offer of advanced technologies from producers, few attempts towards devising a specific methodology for data warehouse design have been made. On the other hand, the statistic reports related to DW project failures state that a major cause lies in the absence of a global view of the design process: in other terms, in the absence of a design methodology. Summary Summary Introduction to Data Warehousing Conceptual design of Data Warehouses Workload-based logical design for ROLAP Indexes for physical design

DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

  • Upload
    lydung

  • View
    236

  • Download
    4

Embed Size (px)

Citation preview

Page 1: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

1

DATA WAREHOUSE DESIGNDATA WAREHOUSE DESIGN

ICDE 2001 TutorialICDE 2001 Tutorial

Stefano Rizzi, Matteo Golfarelli

DEIS - University of Bologna, Italy

2

MotivationMotivationBuilding a data warehouse for an enterprise is a huge and complextask, which requires an accurate planning aimed at devisingsatisfactory answers to organizational and architectural questions.Despite the pushing demand for working solutions coming fromenterprises and the wide offer of advanced technologies fromproducers, few attempts towards devising a specific methodology fordata warehouse design have been made. On the other hand, thestatistic reports related to DW project failures state that a majorcause lies in the absence of a global view of the design process: inother terms, in the absence of a design methodology.

SummarySummary� Introduction to Data Warehousing� Conceptual design of Data Warehouses� Workload-based logical design for ROLAP� Indexes for physical design

Page 2: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

3

IntroductionIntroductionto Data Warehousingto Data Warehousing

Stefano Rizzi

4

� Information systems are rooted in the relationshipbetween information, decision and control.

� An IS should collectcollect and classifyclassify the information, bymeans of integratedintegrated and suitablesuitable procedures, inorder to produce in timein time and at the right levelsright levels thesynthesis to be used to support the decisionalprocess, as well as to administrate and globallycontrol the enterprise activity.

Information Systems: profile and roleInformation Systems: profile and role

Page 3: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

5

Manufacturing

system

Information

systemInformation

Finished product

Information as a resourceInformation as a resource� Information is an increasing value resource,

required from managers to schedule and monitoreffectively the enterprise activities.

� Information is the first matter which is transformedby information systems like unfinished products aretransformed by manufacturing systems.

6

Amount

Value Strategic directions

Reports

Selected information

Primary information sources

Value of informationValue of information

� Information is an enterprise resource like capital, firstmatters, plants and people; thus, it has a cost.

� Hence, understanding the value of information isimportant.

Page 4: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

7

Different kinds of information systemsDifferent kinds of information systems

SalesSales and andmarketingmarketing

ManufacturingManufacturing FinanceFinance AccountingAccounting HumanHumanresourcesresources

Operatio

nal

Operatio

nal OperationalOperationalmanagersmanagers

TPSKnowle

dge

Knowledge

KnowledgeKnowledge and anddata data workersworkers

OASKWS

Managem

ent

Managem

entMiddleMiddlemanagersmanagers

MISDSS

Strate

gic

Strate

gic SeniorSeniormanagersmanagers

ESS

8

The The ““Data WarehouseData Warehouse”” phenomenon phenomenon

�� Usual complaints:Usual complaints:

�We have tons of data but we cannot accessthem!�How can people playing the same roleproduce substantially different results?�We want to slice and dice data in anypossible way!�Show me only what is important!�Everyone knows some data are incorrect...

�(R. Kimball, The Data Warehouse Toolkit)

Page 5: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

9

Data WarehousingData Warehousing� A collection of technologies and tools supporting the

knowledge worker (executive, manager, analyst) inanalysing data aimed at decision making and atimproving the knowledge assets of the enterprise.

Data WarehouseAt the core of the architecture of modern information systems,it is a data repository:

�Oriented to subjects�Integrated and consistent�Representing temporal evolution�Non volatile

The data warehouse is regularly refreshed, permanently growing,The data warehouse is regularly refreshed, permanently growing,logically centralised and easily accessed by users, essentially read-onlylogically centralised and easily accessed by users, essentially read-only

10

External dataOperational data (relational, legacy)

ReportingtoolsAnalysis tools

(OLAP)

WarehouseWarehouseSummarySummarydatadata

AccessAccess

Data mining

Data WarehouseData Warehouse

What-Ifanalysis

ETL tools

Page 6: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

11

Data Data MartsMarts

Data Data WarehouseWarehouse

Data mart

ClientClientmanagementmanagement

GeographicalGeographicalregionsregions

SupplierSuppliermanagementmanagement

MarketingMarketingFinanceFinance

Replication and broadcasting

12

Subject Subject vsvs Process Process

reservations

charge

Medicalreports

admissions

Emphasis on applications

patient

region

consumption

Emphasis on subjects

Page 7: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

13

Integration and consistencyIntegration and consistency

DB

DW

Externaldata

Text files

Schema IntegrationExtraction

TransformationCleaning

ValidationFilteringLoading

wrappers mediators

loaders

14

Temporal evolutionTemporal evolution

OLTPDW

Restricted historical content, Often time is not includedin keys,Data are updated

Rich historical content,Time is included in keys,Snapshots cannot beupdated

Current values Snapshot

Page 8: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

15

Non-volatilityNon-volatility

OLTP

insert delete

updateDW

load

Huge data volumes:from 20 GBs to some TBs

in a few years

� In a DW, no advanced techniques for transaction managementare required (differently from OLTP systems)

� Key issues are the query throughput and the resilience

access

16

DWDW vs vs. OLTP. OLTP

• 90% ad hoc queries

• Mostly read access• Hundreds users• Denormalised• Supports historical

versions• Optimised for accesses

involving mostdatabase

• Based on summarydata

• 90% predefinedtransactions

• Read/write access• Thousands users• Normalised• Does not support historical

versions• Optimised for accesses

involving a small databasefraction

• Based on elemental data

Page 9: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

17

ROLAP (Relational OLAP)ROLAP (Relational OLAP)

� Intermediate level server between a relational back- end serverand the front-end client

� Specialised middleware� Generation of SQL multi-statements for the back-end server� Query scheduling

MOLAP (Multidimensional OLAP)MOLAP (Multidimensional OLAP)

� Direct support of multi-dimensional views� Special data structures (e.g., multi-dimensional arrays)� Compression techniques� Intelligent disk/memory caching� Pre-computation� Complex analysis

18

The technological progressThe technological progress

datadata

knowledgeknowledge

1970 1980 1990 2000

Statistics Statistics &&reportingreporting

DataDataWarehousingWarehousing

OLAPOLAP

DataDataMiningMining

PatternPatternWarehousingWarehousing

Ref

inem

ent

Source:InformationDiscovery

Page 10: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

19

The Data The Data Warehouse Warehouse MarketMarket

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1998 1999 2000 2001 2002

RDBMS

OLAP

0

5 0

100

150

200

250

300

350

400

1998 1999 2000 2001 2002

Data Marts

ETL

Data Quality

Metadata

Source: Shilakes, Tylman -Enterprise Information Portals

20

The DW life-cycleThe DW life-cycle

Objective definition andplanning

Clearly determine the scopes,define the borders, estimatedimensions, choose the approach todesign, evaluate the benefits

Infrastructure design Choose the technologies and thetools, analyse the architecturalsolutions, solve the managementproblems

Design and implementationof applications Add iteratively new data marts

and applications to the warehouse

Page 11: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

21

BibliographyBibliography� R. Barquin, S. Edelstein. Planning and Designing the Data Warehouse. Prentice Hall

(1996).

� S. Chaudhuri, U. Dayal. An overview of data warehousing and OLAP technology.SIGMOD Record 26,1 (1997).

� G. Colliat. OLAP, relational and multidimensional database systems. SIGMOD Record25, 3 (1996).

� M. Demarest. The politics of data warehousing.Http://www.hevanet.com/demarest/marc/dwpol.html

� U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth. Data mining and knowledge discoveryin databases: an overview. Comm. of the ACM 39, 11 (1996).

� W.H. Inmon. Building the data warehouse. John Wiley & Sons (1996).� S. Kelly. Data Warehousing in Action. John Wiley & Sons (1997).

� R. Kimball. The data warehouse toolkit. John Wiley & Sons (1996).

� R. Kimball, L. Reeves, M. Ross, W. Thornthwaite. The data Warehouse LifecycleToolkit. John Wiley & Sons (1998).

� C. Shilakes, J. Tylman. Enterprise Information Portals.Http://www.sagemaker.com/company/downloads/eip/indepth.pdf

� P. Vassiliadis. Gulliver inthe land of data warehousing: practical experiences andobservations of a researcher. Proc. DMDW’2000 (2000).

� J. Widom. Research Problems in Data Warehousing. Proc. CIKM (1995).

22

Conceptual modellingConceptual modellingfor Data Warehousingfor Data Warehousing

Stefano Rizzi

Page 12: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

23

Why a new conceptual model?Why a new conceptual model?

� While it is universally recognised that a DW leans on amultidimensional model, there is no agreement on theapproach to conceptual modelling.

� On the other hand, an accurate conceptual design isthe necessary foundation for building a “good”information system.

� The Entity/Relationship model is widespread in theenterprises, but….

"Entity relation data models [...] cannot be understoodby users and they cannot be navigated usefully by DBMS

software. Entity relation models cannot be used as thebasis for enterprise data warehouses.” (Kimball, 96)

24

SalesSales

Stor

eSt

ore

ProductProductTim

eTim

e

The multidimensional data modelThe multidimensional data modelNumber of Cokecans sold atBIGSTORES inLondon on 10/10/99

Number of Pepsicans sold at allBIGSTORES on10/10/99

Number of Fantacans globally sold

Page 13: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

25

Basic Basic terminologyterminology

� Fact (cube, target). It is a focus of interest for the decision-making process; typically, it models an event occurring in theenterprise world (sales, shipments, purchases). It is essential fora fact to have some dynamic aspects, i.e., to evolve somehowacross time.

� Measures (attributes, variables, metrics, properties). They arecontinuously valued (typically numerical) attributes which describea fact from different points of view. For instance, each sale ismeasured by its revenue.

� Dimensions. They are discrete attributes which determine theminimum granularity adopted to represent facts. Typicaldimensions for the sale fact are product, store and date.

� Hierarchies (dimensions). They contain dimensionattributes (levels, parameters) connected in a tree-likestructure by many-to-one relationships (functional dependencies).

26

DW DW modellingmodelling in the in the literatureliterature

Gyssens, Lakshmanan 97

Agrawal et al. 95

Li, Wang 96

Cabibbo, Torlone 98Datta, Thomas 97

Vassiliadis 98

Tryfona et al. 99

Hüsemann et al. 00

Sapia et al. 98

Franconi, Sattler 99

Golfarelli et al. 98

Page 14: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

27

LOGICALLOGICAL

CONCEPTUALCONCEPTUAL

DW DW modellingmodelling in the in the literatureliterature

Gyssens, Lakshmanan 97

Agrawal et al. 95

Li, Wang 96

Cabibbo, Torlone 98Datta, Thomas 97

Vassiliadis 98

Tryfona et al. 99

Hüsemann et al. 00

Sapia et al. 98

Franconi, Sattler 99

Golfarelli et al. 98

28

GRAPHICALGRAPHICAL

FORMALFORMAL

DW DW modellingmodelling in the in the literatureliterature

Gyssens, Lakshmanan 97

Agrawal et al. 95

Li, Wang 96

Cabibbo, Torlone 98Datta, Thomas 97

Vassiliadis 98

Tryfona et al. 99

Hüsemann et al. 00

Sapia et al. 98

Franconi, Sattler 99

Golfarelli et al. 98

Page 15: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

29

ALGEBRAALGEBRA

DW DW modellingmodelling in the in the literatureliterature

Gyssens, Lakshmanan 97

Agrawal et al. 95

Li, Wang 96

Cabibbo, Torlone 98Datta, Thomas 97

Vassiliadis 98

Tryfona et al. 99

Hüsemann et al. 00

Sapia et al. 98

Franconi, Sattler 99

Golfarelli et al. 98

30

DESIGNDESIGN

DW DW modellingmodelling in the in the literatureliterature

Gyssens, Lakshmanan 97

Agrawal et al. 95

Li, Wang 96

Cabibbo, Torlone 98Datta, Thomas 97

Vassiliadis 98

Tryfona et al. 99

Hüsemann et al. 00

Sapia et al. 98

Franconi, Sattler 99

Golfarelli et al. 98

Page 16: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

31

Conceptual modelsConceptual models

� Sapia, Blaschka, Höfling, Dinter (1998)

dimension level

roll-up relationship

fact relationship

attribute

32

Conceptual models (2)Conceptual models (2)

� Franconi, Sattler (1999)

dimensiontarget

property

level

aggregated entity

Page 17: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

33

Conceptual models (3)Conceptual models (3)

� Hüsemann, Lechtenbörger, Vossen (2000)

dimension

dimensionlevel

measure property attribute

optional property attribute

optional

aggregation path

fact

34

The Dimensional Fact ModelThe Dimensional Fact Model

The Dimensional Fact ModelDimensional Fact Model (DFM) is a graphicalconceptual model for DWs, aimed to:� Effectively support conceptual design;� Provide an environment where user queries can be formulated

intuitively;� Enable communication between the designer and the final user

in order to refine requirement specification;� Supply a stable platform for logical design;� Provide an expressive and non-ambiguous documentation.

The DFM is independent of the target logical model(multidimensional or relational)

Page 18: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

35

� Three levels of conceptual documentation are provided:� Fact scheme: represents a fact of interest and the associated

measures, dimensions and hierarchies.� Data Mart scheme: summarizes the fact schemes which

constitute each data mart and emphasize the feasibleconnections between them.

� Data Warehouse scheme: shows the different data martsemphasizing their overlaps, the different profiles of the usersaccessing them, and the operational sources which feedthem.

The Dimensional Fact Model (2)The Dimensional Fact Model (2)

� Each documentation level is integrated by glossarieswhich explain the names adopted within the schemes,define a connection between the DW data and theoperational sources, express data volumes.

� Data mart schemes are associated to the workloadspecification.

36

hierarchy

Fact schemesFact schemes

A fact expresses a many-to-many relationship between its dimensions

state

SALE

category

type

quarter month

store

storecity

county

sales manager

year

sale district

date

holidayday of week

marketinggroup

department

brand

qty soldrevenueunit priceno. of customers

brand city

product

week

dimensionattribute

measure

fact

dimension

Page 19: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

37

address

non-dimensionattribute

phone

manager

diet

manager

promotion

price reduction

cost

end datebegin date

ad type

optionality

state

SALE

category

type

quarter month

store

storecity

county

sales manager

year

sale district

date

holidayday of week

marketinggroup

department

brand

qty soldrevenueunit priceno. of customers

brand city

product

week

Fact schemes (2)Fact schemes (2)� A non-dimension attribute contains additional information

about a dimension attribute, and is typically connected toit by a one-to-one relationship.It cannot be usedfor aggregation.

� Some links betweenattributes canbe optional.

38

Fact schemes (3)Fact schemes (3)

� Convergence� Cross-dimension attributes� Additivity,

non-additivity,non-aggregability

� Overlap

begin date

end date

store state

diet

marketinggroup

brand city

store county

store city

SALE

product

qty soldrevenueunit priceno. of customers

category

type

department

brand

store

promotion

ad type

price reduction

fiscalweek

fiscalquarter

fiscalmonth

fiscalyear

date

week

day of week

quarter monthyear

manager

sale district

phone

address

V.A.T.

non-aggregabilitycross-dimension

attribute

convergence

Page 20: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

39

The SHIPMENTS fact schemeThe SHIPMENTS fact scheme

marketinggroup

brand city

store state

store city

warehousestate

warehouse city

SHIPMENTTO STORES

product

qty shippedshipping cost

category

type

warehouse

department

brand

store

mode

typecarrier

fiscalweek

fiscalquarter

fiscalmonth

fiscalyear

date

week

day of week

quarter monthyear

FACT SCHEME: SHIPMENT TO STORES

40

The INVENTORY fact schemeThe INVENTORY fact scheme

marketinggroup

brand city

warehousenation

warehouse city

INVENTORY

product

level

category

type

warehouse

department

brand

fiscalweek

fiscalquarter

fiscalmonth

fiscalyear

date

week

day of week

quarter monthyear

FACT SCHEME: INVENTORY

units per pallet

package type

package size

weight

AVG,MIN

Page 21: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

41

The The ““supply chainsupply chain””

date

product

factoryMANUFACTURING

date

component

factoryCOMPONENTINVENTORY

date

product

factory

packagetype

PACKAGING

date

product

warehouseWAREHOUSEINVENTORY

PRODUCTION OFCOMPONENTS

date

component

factory date

component

to factoryCOMPONENTDELIVERY

from factory

date

product

factory

warehouse

SHIPMENT TOWAREHOUSE

mode

date

product

storeSALES

promotion

date

product

store

warehouse

SHIPMENT TOSTORES

mode

42

GlossariesGlossaries

name description domain card. queryproduct products 5000brand brands 800brand city Where brands are manufactured cities 50type (pasta, soft drink, …) pr. types 200category (food, clothing, music,…) pr. categories 10department Deps. managing categories deps. 5marketing group Responsible for product types groups 20

select prodName,brandName, cityName,…from PRODUCTS P,BRANDS B, CITIES C,…where P.brandId =B.brandIdand B.cityId = C.cityIdand . . . . . . . . . . .

stores stores 100store city cities 80store state states 5

select storeName,cityName,stateName from STORESS,CITIES Cwhere S.cityId = C.cityId

.................... .................... ................. ......... . . . . . . . . . . . . .

ATTRIBUTE GLOSSARY: SHIPMENT TO STORES

name description type queryqty shipped Quantity of each product being

shippedINTEGER select SUM(PS.qty)

from PRODUCTS P,SHIP S,PRODSHIPPS,…where P.prodId = PS.prodIdand PS.shipId = S.shipIdand . . . . . . . . . . . . .group by P.prodId,S.date, . . .

shipping cost Cost of the shipment MONEY . . . . . . . . . . . . .

MEASURE GLOSSARY: SHIPMENT TO STORES (sparsity = 0.01)

refresh frequency: 1 per week; refresh technique: periodic complete

Page 22: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

43

Data mart schemesData mart schemes

� The data mart scheme is used to summarize the factschemes which constitute the data mart and to showdrill-across connections between them.

� It is a graph whose nodes are elemental andoverlapped fact schemes; the arcs are directed toeach overlapped scheme from its componentschemes, which in turn may be overlapped.

MANUFACTURING

COMPONENTINVENTORY

PACKAGING

WAREHOUSEINVENTORY

PRODUCTION OFCOMPONENTS

COMPONENTDELIVERY

SHIPMENT TOWAREHOUSE

SALESHIPMENT TOSTORES

DATA MART SCHEME: SUPPLY CHAIN

PRODUCTION AND DELIVERY

DELIVERY AND INVENTORY

MANUFACTURING AND PACKAGING

SHIPMENT AND SALE

DISTRIBUTIONCYCLE

PRODUCT CYCLE

44

The workloadThe workload

� In principle, the workload for a data mart is dynamicand unpredictable.

� In some commercial tools, the actual workload ismonitored while the DW is operating and the logicaland physical schemes are dynamically tuned.

� We claim that a core workload can, and should, bedetermined a priori:� The user typically knows in advance which kind of data

analysis (s)he will carry out more often for decisional orstatistical purposes;

� A substantial amount of queries are aimed at extractingsummary data to fill standard reports.

Page 23: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

45

The workload (2)The workload (2)

marketinggroup

brand city

store state

store city

warehousestate

warehouse city

SHIPMENTTO STORES

product

qty shippedshipping cost

category

type

warehouse

department

brand

store

mode

typecarrier

fiscalweek

fiscalquarter

fiscalmonth

fiscalyear

date

week

day of week

quarter monthyear

FACT SCHEME: SHIPMENT TO STORES

46

Data warehouse schemesData warehouse schemes

� At the highest abstraction level, the data warehousescheme shows the different data marts emphasizingthe fact schemes duplicated on two or more of them,the different profiles of the users accessing them, andthe operational sources which feed them.

DEMANDCHAIN

SUPPLYCHAIN

SALES

RENOVATION

personnelmanager

administrativemanager

saleexecutive

buyer

claims

incentives

personneldatabase

purchases

restorationworks

PERSONNEL

SALES

productdatabase

orders

data mart

user

fact scheme

operational db

file transfer

manual input

Page 24: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

47

Stefano Rizzi

Conceptual designConceptual designof Data Warehousesof Data Warehouses

48

Designing the DWDesigning the DW

� Within a successful approach to DW design, top-downand bottom-up strategies should be mixed.

� When planning a DW, a bottom-up approach should befollowed.

� One data mart at a time is identified and prototyped.

� Each data mart is designed in a top-down fashion bybuilding a conceptual scheme for each fact of interest.

Page 25: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

49

Data Mart prototypingData Mart prototypingPrototype first the data mart which:

� plays the most strategic role for the enterprise;� can convince the final users of the potential benefits;� leans on available and consistent data sources.

DM1

Source 1

DM2

DM3

Source 2

DM4

DM5

Source 3

50

Reference architectureReference architecture

Reconciled data

heterogeneous operational dbs

DW

Problem of designingthe reconciled data(integration ofheterogeneous sources)

Page 26: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

51

Methodological frameworkMethodological framework

analysis of theoperational db

requirementspecification

conceptualdesign

workloadrefinement

logicaldesign

physicaldesign

final user

designer

db administrator

DWs are based on a pre-existing information system

52

Methodological framework Methodological framework (2)(2)

LogicalLogicalSchemeScheme

LOGICALDESIGN

WorkloadTargetlogicalmodel

PhysicalPhysicalSchemeScheme

PHYSICALDESIGN

Workload TargetDBMS

E/R E/R SchemeScheme

chiave negozio negozio città regione indirizzo resp. vendite

N1 …. …. …. …… ………

N2

chiave tempo chiave negozio chiave_prodotto quant venduta incasso num_clienti

T1 N1 P1 10 10000002

T1 N1 P2 8 12000008

T1 N2 P5 15 15000005

… ….. …… …….

RelationalRelationalSchemeScheme

ConceptualConceptualSchemeScheme

CONCEPTUALDESIGN

Facts

Preliminaryworkload

Page 27: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

53

Conceptual design of the data martConceptual design of the data mart

� Design is based on the documentation of theunderlying operational information system:� E/R schemes� Relational schemes

Golfarelli, Maio, Rizzi 98; Cabibbo, Torlone 98;Moody, Kortink 00; Hüsemann, Lechtenbörger, Vossen 00

� Steps:� Find facts� For each fact:

• Navigate functional dependencies• Drop useless attributes• Define dimensions and measures

54

Finding factsFinding facts

� Within an E/R scheme, a fact is represented by either anentity F or an n-ary relationship between entities E1...En

� Within a relational scheme, a fact is represented by arelation F.

The entities and relationships representing frequentlyupdated archives are good candidates to define facts;those representing nearly-static archives are not.

Page 28: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

55

Navigating functional dependenciesNavigating functional dependencies

� Build a tree in which each vertex corresponds to anattribute of the scheme;

� The root corresponds to the identifier (key) of F;� For each vertex v, the corresponding attribute

functionally determines all the attributes correspondingto the descendants of v.

56

Example (from the E/R scheme):Example (from the E/R scheme):

TYPE

PRODUCT

CATEGORY

STORE CITY

(1,1)(0,N)

(1,1) (1,N)

(1,1) (0,N) (1,1) (1,N)

(0,N)

date

qty

unit price

PURCHASETICKET

(1,N)

type category

product

salesmanager

ticket number store city

of

sale

of

in in

weight

diet(0,1)

address

MARKETING GROUP

(1,1)

(1,N)

marketinggroup

for

manager

DEPARTM.

(1,1)

(1,N)

department

for

manager

phone

COUNTY

(1,1)

(1,N)

county

of

STATE

(1,1)

(1,N)

state

of

BRAND

brand

(1,1)(1,N)

(1,1) (1,N)of

WAREHOUSE

(1,N)

(1,N)

fromwarehouse

SALEDISTRICT

district no.

(1,1) (1,N)in

(1,1)

(1,N)

of

address producedin

size

Page 29: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

57

district no

unit price

qty

ticketnumber

date

store

salesmanager

city state

product

brand

typecategory

addressdiet

city

weightdept.

manager

mark. grp.manager

phone

county

district no+state

size

city

state

countysale

Example (from the E/R scheme):Example (from the E/R scheme):

58

Dropping useless attributesDropping useless attributes� Some attributes in the tree may be uninteresting for

the DW. In order to drop useless levels of detail, it ispossible to apply the following operators:�� PruningPruning: delete a vertex and its subtree.

ticketnumber

date

store

salesmanager

city state

address

ticketnumber

date

store

salesmanager

address

address

date

store

salesmanager

�� GraftingGrafting: delete a vertex and move its subtree. It isuseful when an attribute is not interesting but theattributes it determines must be preserved.

Page 30: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

59

Defining dimensionsDefining dimensions

� The choice of dimensions determines the factgranularitygranularity.

� Dimensions must be chosen among the root childrenin the attribute tree.

� Time should always be a dimension.

unit price

qty

date

store

salesmanager

city state

product

brand

typecategory

addressdiet

city

weightdept.

manager

mark. grp.manager

phone

district no+state

city countysale

60

Defining measuresDefining measures� Measures must be chosen among the children of the root.� Typically, measures are computed either by counting the

number of instances of F, or by summing (averaging, …)expressions which involve numerical attributes.

� An attribute cannot be both a measure and a dimension.� A fact may have no measures.

unit price

qty

date

store

salesmanager

city state

product

brand

typecategory

addressdiet

city

weightdept.

manager

mark. grp.manager

phone

district no+state

city countysale

Page 31: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

61

GranularityGranularity

� Defining the granularity of data is a primary issue indetermining performance. Granularity depends on thequeries users are interested in, and represents atrade-off between query response time and detail ofinformation to be stored.� It may be worth adopting a finer granularity than that

required by users, provided that this does not slow downthe system too much.

� Constrained by the maximum time frame for loading.

� Choosing granularity includes defining the refreshinterval.� Issues to be considered:

• Availability of operational data• Workload characteristics• The total time period to be analysed

62

WWANANDDa CASE a CASE tool fortool for data data warehousewarehouse design design

� A design methodology is almost useless, if no CASE tool tosupport it is provided.� Acquire the relational db scheme via ODBC

� Carry out conceptual design

� Define the workload

� Calculate data volume

� Carry out logical design

� Create the documentation (including loading/feeding queries)

Page 32: DATA WAREHOUSE DESIGN - unibo.itsrizzi/PDF/DWtutorial.pdf · DATA WAREHOUSE DESIGN ... data Access Data mining Data Warehouse What-If ... M. Ross, W. Thornthwaite. The data Warehouse

63

Bibliography Bibliography (1)(1)� K. Aberer, K. Hemm. A methodology for building a data warehouse in

a scientific environment. Proc. 1st Int. Conf. on Cooperative Inf.Systems, Brussels (1996).

� R. Agrawal, A. Gupta, S. Sarawagi Modeling multidimensionaldatabases. IBM Research Report, IBM Almaden Research Center(1995).

� M. Blaschka et al. Finding your way through multidimensional datamodels. Proc. DEXA’98 (1998).

� L. Cabibbo, R. Torlone. A logical approach to multidimensionaldatabases. EDBT 98 (1998).

� A. Datta, H. Thomas. A conceptual model and algebra for on-lineanalytical processing in data warehouses. Proc. WITS’97 (1997).

� E. Franconi, U. Sattler. A data warehouse conceptual model formultidimensional aggregation. Proc. DMDW’99 (1999).

� M. Golfarelli , D. Maio, S. Rizzi The Dimensional Fact Model: aconceptual model for data warehouses. Int. Jour. of Cooperative Inf.Systems 7, 2&3 (1998).

� M. Golfarelli, S. Rizzi. Designing the data warehouse: key steps andcrucial issues. Jour. of Computer Science and InformationManagement 2, 3 (1999).

64

Bibliography Bibliography (2)(2)� M. Gyssens, L.V.S. Lakshmanan. A foundation for multi-dimensional

databases. Proc. 23rd VLDB, Athens, Greece (1997).� B. Hüsemann , J. Lechtenbörger, G. Vossen. Conceptual data

warehouse design. Proc. DMDW’00 (2000).� R. Kimball. The data warehouse toolkit. John Wiley & Sons (1996).� D. Moody, M. Kortink. From enterprise models to dimensional models:

a methodology for data warehouse and data mart design. Proc.DMDW’00 (2000).

� T. Bach Pedersen, C. Jensen. Multidimensional data modelling forcomplex data. Proc. 15th ICDE, Sydney (1999).

� C. Sapia et al. Extending the E/R model for the multidimensionalparadigm. Proc. ER’98 (1998).

� N. Tryfona, F. Busborg, J. Christiansen. starER: A Conceptual Modelfor Data Warehouse Design. Proc. DOLAP’99 (1999).

� P. Vassiliadis. Modeling multidimensional databases, cubes and cubeoperations. Proc. 10th SSDBM Conf., Capri, Italy (1998).