24
Decision Support Systems 1201311 Data Warehousing Chattrakul Sombattheera

Decision Support Systems 1201311 Data Warehousing

  • Upload
    clara

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Decision Support Systems 1201311 Data Warehousing. Chattrakul Sombattheera. Agenda. Definitions and Concepts Process Overview Architecture Data Integration and the ETL Processes Development Real-Time Data Warehousing Administration and Security Issues. Definitions and Concepts. - PowerPoint PPT Presentation

Citation preview

Page 1: Decision Support Systems 1201311 Data Warehousing

Decision Support Systems1201311

Data WarehousingChattrakul Sombattheera

Page 2: Decision Support Systems 1201311 Data Warehousing

Agenda

• Definitions and Concepts

• Process Overview

• Architecture

• Data Integration and the ETL Processes

• Development

• Real-Time Data Warehousing

• Administration and Security Issues

Page 3: Decision Support Systems 1201311 Data Warehousing

Definitions and Concepts

• A data warehouse (DW) is a pool of data produced to support decision making; it is also a repository of current and historical data of potential interest to managers throughout the organization.

• Data are usually structured to be available in form ready for analytical processing activities (e.g., online analytical processing [OLAP], data mining, querying, reporting, other decision support applications).

• A data warehouse is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process.

Page 4: Decision Support Systems 1201311 Data Warehousing

Characteristics of DW• Subject-oriented. Data are organized by detailed subject, such as

sales, products, or customers, containing only information relevant for decision support that allows decision makers to both know how their business performs and why.– A DW differs from typical DBs that they are more product-oriented and

are tuned to handle transactions of DBs. Subject orientation allows a more comprehensive view of organization

• Integrated. Data are from different sources and must be in a consistent form. DWs must deal with naming conflicts and discrepancies among units of measure.

• Time variant (time series). A DW usually contains historical data (containing multiple time points, e.g. daily, weekly, monthly), except in real-time systems. They detect trends, deviations, long-term relationships for forecasting and comparisons, leading to decision making.

• Nonvolatile. Users cannot change data in DWs. Obsolete data are discarded, and changes are recorded as new data.

Page 5: Decision Support Systems 1201311 Data Warehousing

Additional DW Characteristics

• Web-based. DWs are typically web-based application.

• Relational/Multidimensional. DWs use either a relational structure or multidimensional structure.

• Client/server. DWs use client/server architecture to provide easy access to end users.

• Real-time. Newer DWs provide real-time or active, data access and analysis capabilities.

• Include metadata. DWs contain metadata (data about data) about how data organized and how to effectively use them.

Page 6: Decision Support Systems 1201311 Data Warehousing

Parts of DWs.• Data Marts. A data mart is a subset of a DW, typically consisting of

a single subject area (e.g. marketing, operations).• Operational Data Stores (ODS). An ODS provides a fairly recent

form of customer information file (CIF) and is used for short-term decision involving mission-critical applications. The contents of an ODS are updated through the course of business operations.

• Enterprise Data Warehouses (EDW). An EDW is a large scale DW that is used across the enterprise for decision support. EDW are used to provide data for many types of DSS, including customer relation management (CRM), supply chain management (SCM), business performance management (BPM), business activity monitoring (BAM), product lifecycle management (PLM), revenue management, knowledge management systems (KMS), etc.

Page 7: Decision Support Systems 1201311 Data Warehousing

DW Process Overview• Data sources. Data come from various sources including legacy systems,

external data providers, online transaction processing (OLTP), enterprise resource planning (ERP) system, Web logs, etc.

• Data extractions. Data are extracted using custom-written or commercial software (ETL).

• Data loading. Data are loaded into staging area, where they are transformed and cleansed.

• Comprehensive database. EDWs support all decision analysis by providing relevant summarised and detailed information originating from many different sources.

• Metadata. Medata are maintained so that they can be accessed by IT personnel and users. Metadata include software programs about data and rules for organizing data summaries that are easy to index and search.

• Middleware tools. Middleware tools enable access to the data warehouse. Middleware tools include SQL, Business Object, applications (data mining, OLAP, reporting tools, data visualize tools), etc.

Page 8: Decision Support Systems 1201311 Data Warehousing

DW Process Overview

Operational systems/data

Legacy

OLTP

External

Select

Extract

Transform

Integrate

Maintain

Preparation

Enterprise data warehouse

Target database [HDB, MDDB]

Metadatareports

Replication

Data mart

Data mart

Data mart

AP

I Midd

lew

are

AccessCustom-suit applications

Production reporting

tools

Relational query tools

OLAP/hOLAP

Web browsers

Data mining

Applications

Information visualization

Page 9: Decision Support Systems 1201311 Data Warehousing

DW Architectures

Three-tier DW

Application Server Database Server

Client Workstation

Page 10: Decision Support Systems 1201311 Data Warehousing

DW Architectures

Two-tier DW

Application & Database Server

Client Workstation

Database

Page 11: Decision Support Systems 1201311 Data Warehousing

DW Architectures

Web browser

Web pages

Web server

Application Server

Data warehouse

ClientInternet, intranet, and/or extranet

Web-based DW

Page 12: Decision Support Systems 1201311 Data Warehousing

DW Architectures

ETL

Central metadata

Central data warehouse

RDBMS

Data analysis

Data martData

analysis

Source Systems

Enterprise Data Warehousing Architecture

Page 13: Decision Support Systems 1201311 Data Warehousing

DW Architecture

ETL

RDBMS

Sales Data mart

Source Systems

Data Mart Architecture

RDBMS MDB

Financial Data mart

Human Resource Data mart

Local Metadata Local Metadata Local Metadata

Page 14: Decision Support Systems 1201311 Data Warehousing

DW Architectures

ETL

RDBMS

Data analysis

Source Systems

Hub and Spoke Data Mart Architecture

RDBMS MDB

Data analysis

Data analysis

Local Metadata Local Metadata Local Metadata

Central metadata

Page 15: Decision Support Systems 1201311 Data Warehousing

DW Architectures

ETL

Central data warehouse

Data mart

Source Systems

Enterprise Warehouse and Operational Data Store

RDBMSOperational data store

Data analysis OLTP tools

Central metadata

Data analysis

Page 16: Decision Support Systems 1201311 Data Warehousing

DW Architectures

ETL

Central data warehouse

Data mart

Source Systems

Enterprise Warehouse and Operational Data Store

RDBMSOperational data store

Data analysis OLTP tools

Central metadata

Data analysis

Page 17: Decision Support Systems 1201311 Data Warehousing

DW Architectures

ETL

Source Systems

Distributed Data Warehouse Architecture

Data analysis

Data analysis

ETL ETL

Data mart Central DW

Page 18: Decision Support Systems 1201311 Data Warehousing

10 Factors for choosing DW architecture

• Information interdependence between organizational units

• Upper management’s information needs• Urgency of need for a data warehouse• Nature of end-user tasks• Constraints on resources• Strategic view of the data warehouse prior to

implementation• Compatibility with existing systems• Perceived ability of the in-house IT staff• Technical issues• Social/Political factors

Page 19: Decision Support Systems 1201311 Data Warehousing

Data Integration

• Data integration comprises three major processes that, when correctly implemented, permit data to be accessed and made accessible to an array of ETL and analysis tools and data warehousing environment: – Data access (i.e., the ability to access and extract

data from any data source),– Data federation (i.e., the integration of business views

across multiple data stores), and– Change capture (i.e., based on the indentification,

cpature, and delivery of the changes made to enterprise data sources).

Page 20: Decision Support Systems 1201311 Data Warehousing

Data Integration Techniques• Enterprise application integration (EAI) provides a vehicle for

pushing data from source systems into the data warehouse.• EAI involves integrating application functionality and is focused on

sharing functionality (rather than data) across systems, thereby enabling flexibility and reuse.

• Traditional EAP focuses on enabling application reuse at the programming level, whereas modern EAP uses services-oriented architecture (SOA).

• Enterprise information integration (EII) proposes real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases.

• EII tools use predefined metadata to populate views that make integrated data appear relational to end users. XML seems to be the most appropriate tool to define metadata.

Page 21: Decision Support Systems 1201311 Data Warehousing

Extraction, Transformation, and Load (ETL)

• ETL is the heart of DW.• ETL is composed of

– Extraction: reading data from one or more databases,– Transformation: converting the extracted data from its previous form into the form

in which it needs to be so that it can be placed into a data warehouse or simply another database, and

– Load: putting the data into datawarehouse.• Transformation occurs by using rules or lookup tables or by combining the

data with other data.• The three database functions are integrated into one tool to pull data out of

one or more databases and place them into another, consolidated database or a data warehouse.

• ETL tools transport data between sources and targets, document how data elements (e.g. metadata) change as the move between source and target, exchange metadata with other application as needed, and administer all runtime processes and operations (e.g., scheduling, error management, audit logs, statistics).

Page 22: Decision Support Systems 1201311 Data Warehousing

ETL Process

Transient data source

Extract

ETL Process

Transform

Data warehouse

Data mart

Cleanse

Packaged application

Legacy system

Other internal applications

Load

Page 23: Decision Support Systems 1201311 Data Warehousing

DW Development Approaches

• The inmon Model: The EDW Approach

• The Kimbell Model: The Data Mart Approach

Page 24: Decision Support Systems 1201311 Data Warehousing

DW Structure: Star Schema

Star Shema

Driver

Dimension: How data will be accessed (e.g. by location,

time period, type of automobile or driver

Automotive

Claim Information

Location Time

Facts: Central table that contains summarized

(usually) information; also contains foreign keys to

access each dimension table