157
 Business Intelligence & Data Warehousing ANAND.T, Business Intelligence, Citicards, Tata Consultancy Services Ltd.,

Business Intelligence - Data Warehouse Implementation

Embed Size (px)

Citation preview

Page 1: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 1/157

Business

Intelligence &Data Warehousing

ANAND.T,Business Intelligence, Citicards,Tata Consultancy Services Ltd.,

Page 2: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 2/157

Lecture I

Basics and Concepts

Page 3: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 3/157

Motivation

Aims of information technology:To help workers in their everyday business activity andimprove their productivity – clerical data processingtasks

To help knowledge workers (executives, managers,analysts) make faster and better decisions – decisionsupport systems

Two types of applications:

Operational applicationsAnalytical applications

Page 4: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 4/157

The Architecture of Data

Operational data

Metadata

Database schema

Summary data

Business

rules

What’s has beenlearned from data

Logical model

physical layout of data

who,

what,when, where,

summaries

by who,what, when,where,...

Page 5: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 5/157

Business Intelligence

“Business Intelligence is a technology basedon customer and profit oriented models thatreduces operating costs and provideincreased profitability by improvingproductivity, sales, service and helps to makedecision making capabilities at no time.”

Page 6: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 6/157

BI Cycle

BusinessIntelligence

A N A L Y S I S

INSIGHT

A C T I ON

MEASUREMENT

Page 7: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 7/157

Uses of BusinessIntelligence

Operational EfficiencyERP ReportingKPI TrackingProduct ProfitabilityRisk ManagementBalanced ScorecardActivity Based Costing

Global SourcingLogistics

Page 8: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 8/157

Uses of BusinessIntelligence

Customer InteractionSales AnalysisSales Forecasting

SegmentationCross-sellingCRM AnalyticsCampaign PlanningCustomer Profitability

Page 9: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 9/157

MarketResearch

TelephoneSurveys

OnlineSurveys

FocusGroups

MysteryShopping

CustomPanels

Online FocusGroups

One-on-ones

EnvironmentalScanning

AC NeilsonReports

AssociationStats

GovernmentReports

MediaMonitoring Economic

Reports

SyndicatedStudies

Data Mining

PredictiveModelling

SegmentationMining Customer

Records

POS SystemsCRM

LibrarySciences

CompetitiveIntelligence

Google

InternalScanning

News ScanningServices

Ad Scanning/Tracking Mystery

Shopping

Website

Page 10: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 10/157

BI ToolsThese tools will illustrate business intelligence in the areas of customer

profiling, customer support, market research, market segmentation, product profitability, statistical analysis, inventory and distributionanalysis.

Page 11: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 11/157

Evolution

60’s: Batch reportshard to find and analyze informationinflexible and expensive, reprogram every new request

70’s: Terminal-based DSS and EIS (executive informationsystems)

still inflexible, not integrated with desktop tools80’s: Desktop data access and analysis tools

query tools, spreadsheets, GUIseasier to use, but only access operational databases

90’s: Data warehousing with integrated OLAP engines andtools

Page 12: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 12/157

Data Warehousing Market

Hardware: servers, storage, clientsWarehouse DBMSToolsMarket growing from

$2B in 1995 to $8 B in 1998 (Meta Group)Systems integration & ConsultingAlready deployed in many industries: manufacturing,retail, financial, insurance, transportation, telecom,utilities, healthcare.

Page 13: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 13/157

What is a Data

Warehouse“A data warehouse is a subject-oriented,integrated, time-variant, and nonvolatilecollection of data in support ofmanagement’s decision-making process.” ---

W. H. InmonCollection of data that is used primarily inorganizational decision makingA decision support database that is maintained

separately from the organization’s operationaldatabase

Page 14: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 14/157

How Many Matches?

Page 15: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 15/157

How Many Matches Now?

Page 16: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 16/157

Data Warehouse - SubjectOriented

Subject oriented: oriented to the major subjectareas of the corporation that have been definedin the data model.

E.g. for an insurance company: customer, product,

transaction or activity, policy, claim, account, andetc.

Operational DB and applications may be

organized differentlyE.g. based on type of insurance's: auto, life,medical, fire, ...

Page 17: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 17/157

Data Warehouse –Integrated

Lack consistency in encoding, namingconventions, …, among different data sourcesHeterogeneous data sources

When data is moved to the warehouse, it isconverted.

Page 18: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 18/157

Data Warehouse - Non-

VolatileOperational data is regularly accessed andmanipulated a record at a time, and update isdone to data in the operational environment.

Warehouse Data is loaded and accessed.Update of data does not occur in the datawarehouse environment.

Page 19: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 19/157

Data Warehouse - Time

VarianceThe time horizon for the data warehouse issignificantly longer than that of operationalsystems.

Operational data: current value data.

Data warehouse data : nothing more than asophisticated series of snapshots, taken of atsome moment in time.

The key structure of operational data may or may not contain some element of time. Thekey structure of the data warehouse alwayscontains some element of time.

Page 20: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 20/157

Why Separate DataWarehouse?

PerformanceSpecial data organization, access methods, andimplementation methods are needed to supportmultidimensional views and operations typical of OLAPComplex OLAP queries would degrade performancefor operational transactions

Concurrency control and recovery modes of OLTPare not compatible with OLAP analysis

Page 21: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 21/157

Why Separate DataWarehouse?

FunctionMissing data: Decision support requires historical datawhich operational DBs do not typically maintainData consolidation: DS requires consolidation

(aggregation, summarization) of data from heterogeneoussources: operational DBs, external sourcesData quality: different sources typically use inconsistentdata representations, codes and formats which have to bereconciled.

Page 22: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 22/157

Advantages of Warehousing

High query performanceQueries not visible outside warehouseLocal processing at sources unaffected

Can operate when sources unavailableCan query data not stored in a DBMSExtra information at warehouse

Modify, summarize (store aggregates)Add historical information

Page 23: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 23/157

Advantages of MediatorSystems

No need to copy dataless storageno need to purchase data

More up-to-date dataQuery needs can be unknownOnly query interface needed at sources

May be less draining on sources

Page 24: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 24/157

Requirements for DataWarehousing

Load performanceLoad processingData quality management

Query perfomanceTerabyte scalabilityMass user scalability

Networked data warehouse

Warehouse administrationIntegrated dimensional analysisAdvanced query funtionality

Page 25: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 25/157

ExtractTransformLoadRefresh

Data Warehouse

Metadatarepository

Datamartso/p

OLAPserver

OLAP Data miningReports

Operationaldatabases

External datasources

The Architectureof Data Warehousing

Page 26: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 26/157

Operationaldata source1

Warehouse Manager

DBMS

Operational

data source 2

Meta-dataHigh

summarized data

Detailed data

Lightlysummarized

data

Operationaldata store (ods)

Operationaldata source n

Archive/backupdata

LoadManager

End-useraccess tools

Typical data warehouse – Three Tier architecture

Operational data store (ODS)

QueryManager

summarizeddata(Relational database)

Summarized data(Multi-dimension database)

Data Mart

(First Tier) (Third Tier)

(Second Tier)

Warehouse Manager

Page 27: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 27/157

Data Sources

Data sources are often the operational systems, providing the lowest level of data.

Data sources are designed for operational use, not for decision support, and the data reflect this fact.

Multiple data sources are often from different systems,run on a wide range of hardware and much of thesoftware is built in-house or highly customized.

Multiple data sources introduce a large number of issues -- semantic conflicts.

Page 28: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 28/157

Creating and Maintaining

a WarehouseData warehouse needs several tools that automateor support tasks such as:

Data extraction from different external data sources,operational databases, files of standard applications(e.g. Excel, COBOL applications), and other documents (Word, WWW).Data cleaning (finding and resolving inconsistencyin the source data)Integration and transformation of data (betweendifferent data formats, languages, etc.)

Page 29: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 29/157

Creating and Maintaininga Warehouse

Data loading (loading the data into the datawarehouse)Data replication (replicating source database into

the data warehouse)Data refreshmentData archivingChecking for data qualityAnalyzing metadata

Page 30: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 30/157

Physical Structure of DataWarehouse

There are three basic architectures for constructing a data warehouse:

Centralized

Distributed/FederatedTiered

The data warehouse is distributed for: load balancing, scalability and higher availability

Page 31: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 31/157

Physical Structure of DataWarehouse

CentralDataWarehouse

Client Client Client

Source Source

Centralized architecture

Page 32: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 32/157

Physical Structure of DataWarehouse

Source Source

EndUsers

MarketingFinancialDistribution

LogicalData

Warehouse

LocalData

Marts

Federated architecture

Page 33: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 33/157

Physical Structure of DataWarehouse

PhysicalData

Warehouse

LocalDataMarts

Workstations(highly summarizeddata)

Source SourceTiered architecture

Page 34: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 34/157

Physical Structure of DataWarehouse

Federated architectureThe logical data warehouse is only virtual

Tiered architectureThe central data warehouse is physicalThere exist local data marts on different tiers

which store copies or summarization of theprevious tier.

Page 35: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 35/157

Want to know more about datawarehousing schemas?

YES NO

Page 36: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 36/157

Related Concepts

Decision Support SystemBusiness ModelingOLTP/OLAPData ModelingETLReportingData Mining

Page 37: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 37/157

Decision Support System(DSS)One of the powerful tools of BI

Information technology to help knowledge workers(executives, managers, analysts) make faster and better decisions:

what were the sales volumes by region and by product category in the last year?how did the share price of computer manufacturerscorrelate with quarterly profits over the past 10 years?will a 10% discount increase sales volumesufficiently?

Page 38: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 38/157

Business Modeling

Depicts the overall picture of a businessSub-categories

Business Process Modeling

Business processes are visually represented as diagrams of simple box with arrow graphicsand text labels

Process Flow Modeling

Describe the various processes that happen in an organization and therelationships between them

Data Flow Modeling

Focuses on the flow of data between various Business Processes

Page 39: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 39/157

Business Modeling Tools

Page 40: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 40/157

Data Processing ModelsThere are two basic data processing models:

OLTP – Online Transaction ProcessingDescribes processing at operational sitesthe main aim of OLTP is reliable and efficient processingof a large number of transactions and ensuring dataconsistency.

OLAP – Online Analytical ProcessingDescribes processing at warehouse

the main aim of OLAP is efficient multidimensional processing of large data volumes.

Page 41: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 41/157

OLTP vs. OLAP OLTP OLAP

Users Clerk, IT professional Knowledge worker Function Day To Day Operations Decision SupportDB Design Application-oriented Subject-orientedData Current, Up-to-date Historical, Summarized

Detailed, Flat Relational MultidimensionalIsolated Integrated, Consolidated

Usage Repetitive Ad-hocAccess Read/Write, Lots Of Scans

Index/Hash On Prim. KeyUnit Of Work Short, Simple Transaction Complex Query# RecordsAccessed Tens Millions#Users Thousands HundredsDB Size 100MB-GB 100GB-TBMetric Transaction Throughput Query Throughput, Response

Page 42: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 42/157

OLAP MultidimensionalDatabases

Page 43: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 43/157

Data Modeling

A Data model is a conceptual representationof data structures (tables) required for adatabase and is very powerful in expressing

and communicating the businessrequirements.Visually represents

Nature of dataBusiness rules governing the dataOrganization in database

Page 44: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 44/157

Data Modeling

Types of data modelingConceptual Data ModelingEnterprise Data Modeling

Logical Data ModelingPhysical Data ModelingRelational Data Modeling

Dimensional Data Modeling

Page 45: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 45/157

Data Modeling

MORE

Page 46: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 46/157

ETL

ETL stands for Extraction, Transformation ,LoadingSteps involved

Mapping the data between source systems andtarget database (data warehouse or data mart)Cleansing of source data in staging area

Transforming cleansed source data and thenloading into the target system

Page 47: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 47/157

ETL Tools

Page 48: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 48/157

Reporting

Business Intelligence Reporting Tools providedifferent views of data by pivoting or rotating thedata across several dimensions.

Nowadays all OLAP tools support reporting.Excel sheets and Flat files are the standardreporting mediums.

Page 49: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 49/157

Data Mining

Data Mining is a set of processes related to analyzing anddiscovering useful, actionable knowledge buried deep

beneath large volumes of data stores or data setsThis knowledge discovery involves finding patterns or

behaviors within the data that lead to some profitable business actionData Mining Life Cycle

Business problem Analysis

Knowledge DiscoveryImplementationResults Analysis

Page 50: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 50/157

Typical Data Warehouse

Page 51: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 51/157

Lecture IIDesign and Implementation

Page 52: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 52/157

Page 53: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 53/157

Database designmethodology for datawarehousesThere are many approaches that offer alternative routes to the

creation of a data warehouseTypical approach – decompose the design of the data warehouseinto manageable parts – data marts, At a later stage, the integration

of the smaller data marts leads to the creation of the enterprise-wide data warehouse.The methodology specifies the steps required for the design of adata mart, however, the methodology also ties together separatedata marts so that over time they merge together into a coherentoverall data warehouse.

Page 54: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 54/157

Step 1: Choosing the process

The process (function) refers to the subject matter of a particular data marts. The first data mart to be builtshould be the one that is most likely to be delivered ontime, within budget, and to answer the most commerciallyimportant business questions.The best choice for the first data mart tends to be the onethat is related to ‘sales’

Page 55: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 55/157

Step 2: Choosing the grain

Choosing the grain means deciding exactly what a fact table recordrepresents. For example, the entity ‘Sales’ may represent the factsabout each property sale. Therefore, the grain of the‘Property_Sales’ fact table is individual property sale.Only when the grain for the fact table is chosen we can identify thedimensions of the fact table.The grain decision for the fact table also determines the grain of each of the dimension tables. For example, if the grain for the‘Property_Sales’ is an individual property sale, then the grain of the ‘Client’ dimension is the detail of the client who bought a

particular property.

Page 56: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 56/157

Step 3: Identifying andconforming the dimensions

Dimensions set the context for formulating queries about thefacts in the fact table.We identify dimensions in sufficient detail to describethings such as clients and properties at the correct grain.If any dimension occurs in two data marts, they must beexactly the same dimension, or one must be a subset of theother (this is the only way that two DM share one or moredimensions in the same application).When a dimension is used in more than one DM, thedimension is referred to as being conformed .

Page 57: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 57/157

Step 4: Choosing the facts

The grain of the fact table determines which facts can beused in the data mart – all facts must be expressed at thelevel implied by the grain.In other words, if the grain of the fact table is an individual

property sale, then all the numerical facts must refer to this particular sale (the facts should be numeric and additive).

Page 58: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 58/157

Step 5: Storing pre-calculationsin the fact table

Once the facts have been selected each should be re-examined to determine whether there areopportunities to use pre-calculations.

Common example: a profit or loss statementThese types of facts are useful since they are additivequantities, from which we can derive valuableinformation.

This is particularly true for a value that is fundamentalto an enterprise, or if there is any chance of a user calculating the value incorrectly.

Page 59: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 59/157

Step 6: Rounding out thedimension tables

In this step we return to the dimension tables and addas many text descriptions to the dimensions as

possible.

The text descriptions should be as intuitive andunderstandable to the users as possible

h h d f

Page 60: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 60/157

Step 7: Choosing the duration of the data warehouse

The duration measures how far back in time the fact table goes.For some companies (e.g. insurance companies) there may be alegal requirement to retain data extending back five or moreyears.

Very large fact tables raise at least two very significant datawarehouse design issues:The older data, the more likely there will be problems inreading and interpreting the old filesIt is mandatory that the old versions of the important

dimensions be used, not the most current versions (we willdiscuss this issue later on)

Page 61: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 61/157

Step 8: Tracking slowlychanging dimensions

The changing dimension problem means that the proper descriptionof the old client and the old branch must be used with the old datawarehouse schema

Usually, the data warehouse must assign a generalized key to theseimportant dimensions in order to distinguish multiple snapshots of clients and branches over a period of timeThere are different types of changes in dimensions:

A dimension attribute is overwrittenA dimension attribute causes a new dimension record to be created,etc.,

S 9 D idi h

Page 62: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 62/157

Step 9: Deciding the querypriorities and the query modes

In this step we consider physical design issues.The presence of pre-stored summaries and aggregatesIndices

Materialized viewsSecurity issueBackup issueArchive issue

D b d i h d l

Page 63: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 63/157

Database design methodologyfor data warehouses - summary

At the end of this methodology, we have a design for a data mart that supports the requirements of a

particular business process and allows the easy

integration with other related data marts to ultimatelyform the enterprise-wide data warehouse.A dimensional model, which contains more than onefact table sharing one or more conformed dimension

tables, is referred to as a fact constellation.

Page 64: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 64/157

Implementing aWarehouse

Page 65: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 65/157

Implementing a Warehouse

Designing and rolling out a data warehouse is acomplex process, consisting of the followingactivities:

Define the architecture, do capacity planning, andselect the storage servers, database and OLAPservers (ROLAP vs MOLAP), and toolsIntegrate the servers, storage, and client tools

Design the warehouse schema and views

Page 66: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 66/157

Implementing a Warehouse

Define the physical warehouse organization, data placement, partitioning, and access method

Connect the sources using gateways, ODBC drivers, or other wrappersDesign and implement scripts for data extraction,cleaning, transformation, load, and refresh

Page 67: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 67/157

Page 68: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 68/157

Implementing aWarehouse

Monitoring: Sending data from sourcesIntegrating: Loading, cleansing, ...Processing: Query processing, indexing, ...Managing: Metadata, Design, ...

Page 69: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 69/157

Monitoring

Data ExtractionData extraction from external sources is usuallyimplemented via gateways and standard interfaces(such as Information Builders EDA/SQL, ODBC,JDBC, Oracle Open Connect, Sybase EnterpriseConnect, Informix Enterprise Gateway, etc.)

Page 70: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 70/157

Monitoring Techniques

Detect changes to an information source thatare of interest to the warehouse:define triggers in a full-functionality DBMS

examine the updates in the log file

write programs for legacy systems

Polling (queries to source)

screen scraping

Propagate the change in a generic form to theintegrator

Page 71: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 71/157

Integration

Integrator Receive changes from the monitorsmake the data conform to the conceptual schema used bythe warehouse

Integrate the changes into the warehousemerge the data with existing data already presentresolve possible update anomalies

Data CleaningData Loading

D t Cl i

Page 72: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 72/157

Data Cleaning

Data cleaning is important to warehouse – there is high probability of errors andanomalies in the data:

inconsistent field lengths, inconsistent descriptions,inconsistent value assignments, missing entries andviolation of integrity constraints.optional fields in data entry are significant sourcesof inconsistent data.

D t Cl i T h i

Page 73: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 73/157

Data Cleaning Techniques

Data migration : allows simple data transformationrules to be specified, e.g. „replace the string gender

by sex” (Warehouse Manager from Prism is anexample of this tool)

Data scrubbing : uses domain-specific knowledgeto scrub data (e.g. postal addresses) (Integrity andTrillum fall in this category)

Data auditing : discovers rules and relationships by

scanning data (detect outliers). Such tools may beconsidered as variants of data mining tools

D t L di

Page 74: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 74/157

Data Loading

After extracting, cleaning and transforming, data must be loaded into the warehouse.Loading the warehouse includes some other

processing tasks: checking integrity constraints,

sorting, summarizing, etc.Typically, batch load utilities are used for loading. Aload utility must allow the administrator to monitor status, to cancel, suspend, and resume a load, and to

restart after failure with no loss of data integrity

d

Page 75: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 75/157

Data Loading Issues

The load utilities for data warehouses have to deal withvery large data volumesSequential loads can take a very long time.

Full load can be treated as a single long batchtransaction that builds up a new database. Usingcheckpoints ensures that if a failure occurs during theload, the process can restart from the last checkpoint

D R f h

Page 76: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 76/157

Data RefreshRefreshing a warehouse means propagating updateson source data to the data stored in the warehousewhen to refresh:

periodically (daily or weekly)

immediately (defered refresh and immediaterefresh) determined by usage, types of datasource,etc.

D R f h

Page 77: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 77/157

Data Refresh

how to refreshdata shippingtransaction shipping

Most commercial DBMS provide replication serversthat support incremental techniques for propagatingupdates from a primary database to one or more

replicas. Such replication servers can be used toincrementally refresh a warehouse when sourceschange

Data Shipping

Page 78: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 78/157

Data Shipping

Data Shipping : (e.g. Oracle Replication Server), atable in the warehouse is treated as a remote snapshotof a table in the source database. After_row trigger isused to update snapshot log table and propagate theupdated data to the warehouse

Page 79: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 79/157

Transaction Shipping

Transaction Shipping : (e.g. Sybase Replication Server,Microsoft SQL Server), the regular transaction log isused. The transaction log is checked to detect updates onreplicated tables, and those log records are transferred to areplication server, which packages up the correspondingtransactions to update the replicas

D i d D

Page 80: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 80/157

Derived Data

Derived Warehouse Dataindexesaggregatesmaterialized views

When to update derived data?The most difficult problem is how to refresh thederived data? The problem of constructing algorithms

incrementally updating derived data has been thesubject of much research!

Materialized Views

Page 81: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 81/157

Materialized Views

Define new warehouse relations using SQLexpressions

sale prodId clientid date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50

p2 c2 1 8p1 c1 2 44p1 c2 2 4

product id name pricep1 bolt 10p2 nut 5

joinTb prodId name price clientid date amtp1 bolt 10 c1 1 12p2 nut 5 c1 1 11

p1 bolt 10 c3 1 50p2 nut 5 c2 1 8p1 bolt 10 c1 2 44p1 bolt 10 c2 2 4

join of sale and product

P i

Page 82: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 82/157

Processing

Index StructuresWhat to Materialize?Algorithms

I d St t

Page 83: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 83/157

Index StructuresIndexing principle:

mapping key values to records for associative directaccess

Most popular indexing techniques in relationaldatabase: B+-treesFor multi-dimensional data, a large number of indexing techniques have been developed: R-trees

I d St t

Page 84: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 84/157

Index Structures

Index structures applied in warehousesinverted lists

bit map indexes join indexestext indexes

MORE

What to Materialize?

Page 85: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 85/157

What to Materialize?

Store in warehouse results useful for commonqueriesExample:

day 2 c1 c2 c3p1 44 4

p2 c1 c2 c3p1 12 50p2 11 8

day 1

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3p1 67 12 50

c1p1 110p2 19

129

. . .

materialize

total sale

View and Materialized

Page 86: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 86/157

View and MaterializedViewsView

derived relation defined in terms of base (stored)relations

Materialized viewsa view can be materialized by storing the tuples of the view in the databaseindex structures can be built on the materializedview

View and Materialized

Page 87: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 87/157

View and MaterializedViews

Maintenance is an issue for materialized viewsrecomputationincremental updating

Page 88: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 88/157

Managing

Metadata Repository

Page 89: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 89/157

Metadata Repository

Administrative metadatasource database and their contentsgateway descriptionswarehouse schema, view and derived datadefinitionsdimensions and hierarchiespre-defined queries and reports

data mart locations and contents

Metadata Repository

Page 90: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 90/157

Metadata Repository

Administrative metadatadata partitionsdata extraction, cleansing, transformationrules, defaultsdata refresh and purge rulesuser profiles, user groupssecurity: user authorization, access control

Metadata Repository

Page 91: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 91/157

Metadata Repository

Businessbusiness terms & definitiondata ownership, charging

Operationaldata layoutdata currency (e.g., active, archived, purged)use statistics, error reports, audit trails

Importance of managing

Page 92: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 92/157

Importance of managingmetadata

The integration of meta-data, that is ”data about data”Meta-data is used for a variety of purposes and the management of itis a critical issue in achieving a fully integrated data warehouseThe major purpose of meta-data is to show the pathway back towhere the data began, so that the warehouse administrators know the

history of any item in the warehouseThe meta-data associated with data transformation and loading mustdescribe the source data and any changes that were made to thedataThe meta-data associated with data management describes the dataas it is stored in the warehouseThe meta-data is required by the query manager to generateappropriate queries, also is associated with the user of queries

State of Commercial

Page 93: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 93/157

State of CommercialPracticeProducts and Vendors Datamation, May 15, 1996; R.C. Barquin, H.A. Edelstein: Planning

and Designing the Data Warehouse. Prentice Hall. 1997]

Connectivity to sourcesApertus CA-Ingres GatewayInformation Builders EDA/SQLIBM Data JionerInformix Enterprise Gateway Microsoft ODBCOracle Open Connect Platinum InfohubSAS Connect Software AG EntireSybase Enterprise Connect Trinzic InfoHub

Data extract, clean, transform, refreshCA-Ingres Replicator Carleton PassportEvolutionary Tech Inc. ETI-Extract Harte-Hanks TrilliumIBM Data Joiner, Data Propagator Oracle 7Platinum InfoRefiner, InfroPump Praxis OmniReplicatorPrism Warehouse Manager Redbrick TMUSAS Access Software AG SouorcepointSybase Replication Server Trinzic InfoPump

State of Commercial

Page 94: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 94/157

State of CommercialPractice

Multidimensional Database EnginesArbor Essbase Comshare Commander OLAPOracle IRI Express SAS SystemWarehouse Data Servers

CA-IngresIBM DB2

Information Builders Focus InformixOracle Praxiz Model 204Redbrick Software AG ADABASSybase MPP TandemTerdata

ROLAP ServersHP Intelligent Warehouse Information Advantage AsxysInformix Metacube MicroStrategy DSS Server

State of Commercial

Page 95: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 95/157

State of CommercialPracticeQuery/Reporting Environments

Brio/Query Business ObjectsCognos Impromptu CA Visual ExpressIBM DataGuideInformation Builders Focus SixInformix ViewPoint Platinum Forest & TreesSAS Access Software AG EsperantMultidimensional AnalysisAndydne PabloArbor Essbase Analysis Server Business Objects Cognos PowerPlayDimensional Insight Cross Target Holistic Systems HOLOSInformation Advantage Decision Suite IQ Software IQ/VisionKenan System Acumate Lotus 123Microsoft ExcelMicrostrategy DSSPilot Lightship Platinum Forest & Trees

Prodea Beacon SAS OLAP ++Stanford Technology Group Metacube

State of Commercial

Page 96: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 96/157

State of CommercialPractice

Metadata ManagementHP Intelligent Warehouse IBM Data GuidePlatinum Repository Prism Directory Manager

System ManagementCA Unicenter HP OpenViewIBM DataHub, NetView Information Builder Site Analyzer

Prism Warehouse Manager SAS CPETivoli Software AG Source PointRedbrick Enterprise Control and Coordination

Process ManagementAt& T TOPEND HP Intelligent WarehouseIBM FlowMark Platinum Repository

Prism Warehouse Manager Software AG Source PointSystems integration and consulting

Research

Page 97: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 97/157

ResearchData cleaning

focus on data inconsistencies, not schema differencesdata mining techniques

Physical Designdesign of summary tables, partitions, indexes

tradeoffs in use of different indexesQuery processing

selecting appropriate summary tablesdynamic optimization with feedbackacid test for query optimization: cost estimation, use of transformations, search strategiespartitioning query processing between OLAP server and backend server.

Research

Page 98: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 98/157

Research

Warehouse Managementdetecting runaway queriesresource managementincremental refresh techniques

computing summary tables during loadfailure recovery during load and refreshprocess management: scheduling queries,load and refreshuse of workflow technology for processmanagement

References

Page 99: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 99/157

References

www.toug.org/files/tougpr200302_4.pptwww-db.stanford.edu/~hector/cs245/Notes12.pptwww.epa.gov/storet/conf/Wilson_Data_Warehouse.pptwww.learndatamodeling.comwww.learnbi.comwww.datawarehousing.ittoolbox.comwww.datawarehousing.com

Page 100: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 100/157

Thank You

QUESTIONS?

Page 101: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 101/157

APPENDIX AData warehouse Schemas

Star schema

Page 102: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 102/157

saleorderId

datecustId

prodIdstoreIdqtyamt

customer custIdname

addresscity

productprodId

nameprice

storestoreId

city

A single object (fact table) in the middle connected to a number

of dimension tables

Star schema

Page 103: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 103/157

customer custId name address city53 joe 10 main sfo81 fred 12 main sfo

111 sally 80 willow la

product prodId name pricep1 bolt 10p2 nut 5

s tore storeId cityc1 nycc2 sfoc3 la

sale oderId date custId prodId storeId qty amto100 1/7/97 53 p1 c1 1 12o102 2/7/97 53 p2 c1 2 11o105 3/8/97 111 p1 c3 5 50

Terms

Page 104: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 104/157

TermsBasic notion: a measure (e.g. sales, qty, etc)Given a collection of numeric measures

Each measure depends on a set of dimensions (e.g.sales volume as a function of product, time, andlocation)

Terms

Page 105: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 105/157

Relation, which relates the dimensions to themeasure of interest, is called the fact table (e.g.sale)Information about dimensions can be

represented as a collection of relations – calledthe dimension tables (product, customer, store)Each dimension can have a set of associated

attributes

Example of Star Schema

Page 106: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 106/157

DateMonthYear

Date

CustIdCustNameCustCityCustCountry

Customer

Sales Fact Table

Date

Product

Store

Customer

unit_sales

dollar_sales

schilling_sales

Measurements

ProductNoProdNameProdDescCategoryQOH

Product

StoreIDCityStateCountryRegion

Store

Example of Star Schema

Dimension Hierarchies

Page 107: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 107/157

Dimension HierarchiesFor each dimension, the set of associated attributes can

be structured as a hierarchy

storesType

city region

customer city state country

Dimension Hierarchies

Page 108: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 108/157

Dimension Hierarchies

store storeId cityId tId mgr s5 sfo t1 joes7 sfo t2 freds9 la t1 nancy city cityId pop regIdsfo 1M north

la 5M south

region regId name

north cold regionsouth warm region

sType tId size locationt1 small downtownt2 large suburbs

Snowflake Schema

Page 109: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 109/157

A refinement of star schema where thedimensional hierarchy is represented explicitly

by normalizing the dimension tables

ProductNoProdName

Product

Example of Snowflake Schema

Page 110: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 110/157

Sales Fact Table

Date

Product

Store

Customer

unit_sales

dollar_sales

schilling_sales

ProdNameProdDescCategoryQOH

CustIdCustNameCustCityCustCountry

Cust

DateMonth

DateMonth

Year

Month

Year

Year

CityState

City

CountryRegion

Country

StateCountry

State

StoreIDCity

Store

Measurements

Fact constellations

Page 111: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 111/157

Fact constellations

Fact constellations : Multiple fact tables sharedimension tables

BACK

Page 112: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 112/157

APPENDIX BData Modeling & OLAP

Multidimensional Data

Page 113: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 113/157

Model

sale Product Client Amtp1 c1 12

p2 c1 11p1 c3 50p2 c2 8

c1 c2 c3p1 12 50p2 11 8

Fact relation Two-dimensional cube

Sales of products may be represented in one dimension (as a fact relation) or in two dimensions, e.g. : clients and products

Multidimensional Data

Page 114: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 114/157

Model

sale Product Client Date Amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

day 2 c1 c2 c3

p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

Fact relation 3-dimensional cube

Multidimensional Data

Page 115: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 115/157

Model and Aggregates

Add up amounts for day 1In SQL: SELECT sum(Amt) FROM SALE

WHERE Date = 1

sale Product Client Date Amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44

p1 c2 2 4

81result

Multidimensional Data

Page 116: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 116/157

Model and Aggregates Add up amounts by dayIn SQL: SELECT Date, sum(Amt)

FROM SALE GROUP BY Date

sale Product Client Date Amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8

p1 c1 2 44p1 c2 2 4

Date sum1 812 48

result

Multidimensional Data

Page 117: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 117/157

Model and Aggregates

Add up amounts by client, productIn SQL: SELECT client, product, sum(amt)

FROM SALEGROUP BY client, product

Multidimensional Data

Page 118: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 118/157

Model and Aggregates

sale Product Client Date Amt

p1 c1 1 12p2 c1 1 11p1 c3 1 50

p2 c2 1 8p1 c1 2 44p1 c2 2 4

sale Product Client Sump1 c1 56p1 c2 4

p1 c3 50p2 c1 11p2 c2 8

Multidimensional Data

Page 119: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 119/157

Model and Aggregates

In multidimensional data model togetherwith measure values usually we store

summarizing information (aggregates)

c1 c2 c3 Sump1 56 4 50 110

p2 11 8 19Sum 67 12 50 129

Aggregates

Page 120: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 120/157

Operators: sum, count, max, min,median, ave

“Having” clauseUsing dimension hierarchy

average by region (within store)maximum by month (within date)

Cube Aggregation

Page 121: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 121/157

gg g

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3sum 67 12 50

sump1 110p2 19

129

. . .Example: computing sums

day 1

Cube Operators

Page 122: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 122/157

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3sum 67 12 50

sump1 110p2 19

129

. . .

sale(c1,*,*)

sale(*,*,*)sale(c2,p2,*)

day 1

Cube

Page 123: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 123/157

c1 c2 c3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129day 2 c1 c2 c3 *

p1 44 4 48p2* 44 4 48

c1 c2 c3 *p1 12 50 62p2 11 8 19* 23 8 50 81

day 1

*

sale(*,p2,*)

Aggregation Using

Page 124: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 124/157

Hierarchies

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

region Aregion Bp1 12 50

p2 11 8

customer

region

country

(customer c1 in Region A;customers c2, c3 in Region B)

Aggregation Using

Page 125: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 125/157

Hierarchies

c1c2

c3c4

videoCamera

Chennai

Bangalore

CD

Date of sale

10121112

35

711

219715

aggregation withrespect to city

client

city

region

Video Camera CDCH 22 8 30BN 23 18 22

A Sample Data Cube

Page 126: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 126/157

sum

sum

sum

USA

Canada

Mexico

Countr y

Date

P r o d

u c t

CDvideocamera

1Q 2Q 3Q 4Q

OLAP Servers

Page 127: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 127/157

Relational OLAP (ROLAP): Extended relational DBMS that maps operations onmultidimensional data to standard relationsoperations

Store all information, including fact tables, asrelations

Multidimensional OLAP (MOLAP): Special purpose server that directly implementsmultidimensional data and operationsstore multidimensional datasets as arrays

OLAP Servers

Page 128: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 128/157

Hybrid OLAP (HOLAP):Give users/system administrators freedom to selectdifferent partitions.

OLAP Queries

Page 129: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 129/157

Roll up : summarize data along a dimensionhierarchy

If we are given total sales volume per city we canaggregate on the Location to obtain sales per states

OLAP Queries

Page 130: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 130/157

c1c2

c3

c4

videoCamera

Chennai

Bangalore

CD

Date of sale

10121112

35

711

219715

aggregation withrespect to city

client

city

region

Video Camera CDCH 22 8 30BN 23 18 22

OLAP Queries

Page 131: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 131/157

Roll down, drill down : go from higher levelsummary to lower level summary or detailed data

For a particular product category, find the detailedsales data for each salesperson by dateGiven total sales by state, we can ask for sales per city,or just sales by city for a selected state

OLAP Queries

Page 132: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 132/157

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3sum 67 12 50

sump1 110p2 19

129

drill-down

rollup

day 1

Page 133: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 133/157

OLAP Queries

Page 134: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 134/157

Pivoting can be combined with aggregation

sale prodId clientid date amtp1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

c1 c2 c3 Sump1 56 4 50 110p2 11 8 19

Sum 67 12 50 129

c1 c2 c3 Sum1 23 8 50 812 44 4 48

Sum 67 12 50 129

OLAP Queries

Page 135: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 135/157

Ranking: selection of first n elements (e.g. select 5best purchased products in July)Others: stored procedures, selection, etc.

Time functionse.g., time average

Cube Operation

Page 136: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 136/157

SELECT date, product, customer, SUM (amount)

FROM SALES

CUBE BY date, product, customer

Need compute the following Group-Bys(date, product, customer),(date,product),(date, customer), (product,customer),

(date), (product), (customer)

Cuboid Lattice

Page 137: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 137/157

Data cube can be viewed as a lattice of

cuboidsThe bottom-most cuboid is the base cube.

The top most cuboid contains only one cell.

(B)(A) (C) (D)

(B,C) (B,D) (C,D)(A,D)(A,C)

(A,B,D) (B,C,D)(A,C,D)

(A,B)

( all )

(A,B,C,D)

(A,B,C)

Cuboid Lattice

Page 138: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 138/157

city, product, date

city, product city, date product, date

city product date

all

day 2 c1 c2 c3p1 44 4p2 c1 c2 c3

p1 12 50p2 11 8

day 1

c1 c2 c3p1 56 4 50p2 11 8

c1 c2 c3p1 67 12 50

129

use greedyalgorithm todecide whatto materialize

Efficient Data Cube

Page 139: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 139/157

ComputationMaterialization of data cube

Materialize every (cuboid), none, or some.

Algorithms for selection of which cuboids tomaterialize:

size, sharing, and access frequency :Type/frequency of queriesQuery response timeStorage cost

Update cost

Dimension Hierarchies

Page 140: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 140/157

Client hierarchy

region

state

city

cities city state regionc1 CA Eastc2 NY Eastc3 SF West

Dimension HierarchiesComp tation

Page 141: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 141/157

Computation

city, product

city, product, date

city, date product, date

city product date

all

state, product, date

state, date

state, product

state

roll-up along clienthierarchy

Cube Computation - ArrayBased Algorithm

Page 142: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 142/157

Based Algorithm

An MOLAP approach:the base cuboid is stored as multidimensionalarray.

read in a number of cells to compute partialcuboids

Cube computations

Page 143: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 143/157

A

C

{ABC}{AB}{AC}{BC}{A}{B}{C}{ }

B

ALL

BACK

Page 144: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 144/157

APPENDIX CIndex Structures

Inverted Lists

Page 145: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 145/157

2023

1819

2021

22

232526

r4r18r34r35

r5r19r37r40

rId name ager4 joe 20

r18 fred 20r19 sally 21

r34 nancy 20r35 tom 20r36 pat 25r5 dave 21

r41 jeff 26

ageindex

invertedlists

datarecords

Inverted Lists

Page 146: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 146/157

Query:Get people with age = 20 and name = “fred”

List for age = 20: r4, r18, r34, r35

List for name = “fred”: r18, r52

Answer is intersection: r18

Bitmap Indexes

Page 147: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 147/157

Bitmap index: An indexing technique that hasattracted attention in multi-dimensional databaseimplementationtable

Customer City Car c1 Detroit Fordc2 Chicago Hondac3 Detroit Hondac4 Poznan Ford

c5 Paris BMWc6 Paris Nissan

Bitmap Indexes

Page 148: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 148/157

The index consists of bitmaps:

Index on City:

ec1 Chicago Detroit Paris Poznan1 0 1 0 02 1 0 0 03 0 1 0 04 0 0 0 1

5 0 0 1 06 0 0 1 0

bitmaps

Bitmap Indexes

Page 149: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 149/157

Index on Car:

ec1 BMW Ford Honda Nissan1 0 1 0 02 1 0 1 03 0 0 1 04 0 1 0 05 1 0 0 06 0 0 0 1

bitmaps

Bitmap Indexes

Page 150: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 150/157

Index on a particular column

Index consists of a number of bit vectors - bitmapsEach value in the indexed column has a bit vector (bitmaps)The length of the bit vector is the number of recordsin the base tableThe i-th bit is set if the i-th row of the base tablehas the value for the indexed column

Bitmap Index

Page 151: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 151/157

2023

1819

2021

22

232526

id name age1 joe 202 fred 203 sally 214 nancy 205 tom 206 pat 257 dave 218 jeff 26

ageindex

bitmaps

datarecords

1101100

00

00

10001011

Using Bitmap indexes

Page 152: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 152/157

Query:Get people with age = 20 and name = “fred”

List for age = 20: 1101100000

List for name = “fred”: 0100000001

Answer is intersection: 0100000000

Good if domain cardinality smallBit vectors can be compressed

Using Bitmap indexes

Page 153: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 153/157

They allow the use of efficient bit operations to

answer some queries“how many customers from Detroit have car ‘Ford’”

perform a bit-wise AND of two bitmaps: answer – c1“how many customers have a car ‘Honda’”count 1’s in the bitmap - answer - 2

Compression - bit vectors are usually sparse for largedatabases – the need for decompression

Bitmap Index – Summary

Page 154: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 154/157

With efficient hardware support for bitmap operations(AND, OR, XOR, NOT), bitmap index offers better access methods for certain queries

e.g., selection on two attributes

Some commercial products have implemented bitmapindex

Works poorly for high cardinality domains since thenumber of bitmaps increases

Difficult to maintain - need reorganization whenrelation sizes change (new bitmaps)

Join“Combine” SALE PRODUCT relations

Page 155: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 155/157

“Combine” SALE, PRODUCT relations

In SQL: SELECT * FROM SALE, PRODUCTsale prodId storeId date amt

p1 c1 1 12p2 c1 1 11p1 c3 1 50p2 c2 1 8p1 c1 2 44p1 c2 2 4

product id name pricep1 bolt 10p2 nut 5

joinTb prodId name price storeId date amtp1 bolt 10 c1 1 12p2 nut 5 c1 1 11p1 bolt 10 c3 1 50p2 nut 5 c2 1 8p1 bolt 10 c1 2 44p1 bolt 10 c2 2 4

Join Indexes

Page 156: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 156/157

product id name price jIndexp1 bolt 10 r1,r3,r5,r6p2 nut 5 r2,r4

sale rId prodId storeId date amtr1 p1 c1 1 12r2 p2 c1 1 11r3 p1 c3 1 50r4 p2 c2 1 8

r5 p1 c1 2 44r6 p1 c2 2 4

join index

Join Indexes

Page 157: Business Intelligence - Data Warehouse Implementation

8/6/2019 Business Intelligence - Data Warehouse Implementation

http://slidepdf.com/reader/full/business-intelligence-data-warehouse-implementation 157/157

Traditional indexes map the value to a list of record

ids. Join indexes map the tuples in the join result of two relations to the source tables.

In data warehouse cases, join indexes relate the valuesof the dimensions of a star schema to rows in the facttable.

For a warehouse with a Sales fact table and dimension city, a join index on city maintains for each distinct city a list of RIDs of the tuples recording the sales in the city

Join indexes can span multiple dimensions