24
Data Warehousing Basics

Data Warehousing PPT

Embed Size (px)

Citation preview

Page 1: Data Warehousing PPT

Data Warehousing Basics

Page 2: Data Warehousing PPT

Healthia Consulting 2

Presentation Outline

1) Data Warehousing Overview• The purpose of Data Warehousing• The history of Data Warehousing

2) The Techniques, Technology and Tools

3) Terms and Definitions

Page 3: Data Warehousing PPT

Healthia Consulting 3

Section 1

Data Warehousing Overview

Page 4: Data Warehousing PPT

Healthia Consulting 4

The Purpose of Data Warehousing

Realize the value of data!!• Data / information is an asset• Data / information can be sold• Methods to realize the value,

(Reporting, Analysis, Data Mining, etc.)

Make better decisions!!• Turn data into information• Create competitive advantage• Methods to support the decision

making process, (EIS, DSS, etc.)

Page 5: Data Warehousing PPT

Healthia Consulting 5

The Bill Inmon Definition

In 1993, the "father of data warehousing", Bill Inmon, gave thisdefinition of a data warehouse:

Page 6: Data Warehousing PPT

Healthia Consulting 6

Huh - What does this mean?

Subject oriented: Data is arranged by subject area rather than by application,which is more intuitive for users to navigate.

Integrated: Data is collected and consistently stored from multiple, diversesources.

Nonvolatile: The data is static, one version of the truth regardless of when thequestion is asked.

Time variant: Allows for access to and analysis of data over time, rather thantypical systems which generally provide just detailed current information.

Page 7: Data Warehousing PPT

Healthia Consulting 7

Data Warehouse vs. Operational Database

Data Warehouse Operational Database

subject oriented application oriented

integrated multiple diverse sources

nonvolatile updateable

time-variant real-time, current

Page 8: Data Warehousing PPT

Healthia Consulting 8

The History

• We’ve always managed historical data . . .

• Technology advancements in the 90’s:

• Hardware - processing and storage advancements

• Tools - client server, web, and other desktop tools

• Evolution of the techniques:• Inmon, Kimball, etc.• The buzz words and hype• The evolution of tools• Success stories

Page 9: Data Warehousing PPT

Healthia Consulting 9

Section 2

Data Warehousing Overview

Page 10: Data Warehousing PPT

Healthia Consulting 10

The Techniques

Multi-Dimensional Modeling• Dimension and Fact tables• Star Schema / Snowflake Schema

Specialized tools and knowledge• DBMS• OLAP, (ROLAP, MOLAP, DOLAP)• Data Movement

Data Quality

Meta data management

Page 11: Data Warehousing PPT

Healthia Consulting 11

The Technology & Tools

OLAP - Online Analytical Processing

ROLAP/MOLAP/DOLAP• ROLAP - Relational OLAP• MD OLAP - Multi Dimensional OLAP• DOLAP - Desktop OLAP

Hardware• MVS, Unix, NT . . .

DBMS• DB2, IDMS, Oracle, Informix, Sybase, Red Brick, SQL Server . . .

Extract, Transform, Load (ETL)

Page 12: Data Warehousing PPT

Healthia Consulting 12

Section 3

Terms and Definitions

Page 13: Data Warehousing PPT

Healthia Consulting 13

Terms and Definitions

Bitmapped Indexing - A family of advanced indexing algorithms that optimize RDBMSquery performance by maximizing the search capability of the index per unit ofmemory and per CPU instruction. Properly implemented, bitmapped indices eliminateall table scans in query and join processing.

Business Model - An object-oriented model that captures the kinds of things in abusiness or a business area and the relationships associated with those things (andsometimes associated business rules, too). Note that a business model existsindependently of any data or database. A data warehouse should be designed tomatch the underlying business models or else no tools will fully unlock the data in thewarehouse.

Page 14: Data Warehousing PPT

Healthia Consulting 14

Terms and Definitions - continued

Corporate Data - All the databases of the company. This includes legacy systems, oldand new transaction systems, general business systems, client/server databases,data warehouses and data marts.

Data Dictionary - A collection of Meta Data. Many kinds of products in the datawarehouse arena use a data dictionary, including database management systems,modeling tools, middleware, and query tools.

Page 15: Data Warehousing PPT

Healthia Consulting 15

Terms and Definitions - continued

Data Mart - A subset of a data warehouse that focuses on one or more specificsubject areas. The data usually is extracted from the data warehouse and further de-normalized and indexed to support intense usage by targeted customers.

Data Mining - Techniques for finding patterns and trends in large data sets. See alsoData Visualization.

Data Model - The road map to the data in a database. This includes the source oftables and columns, the meanings of the keys, and the relationships between thetables.

Page 16: Data Warehousing PPT

Healthia Consulting 16

Terms and Definitions - continued

Data Visualization - Techniques for turning data into information by using the highcapacity of the human brain to recognize visually recognize patterns and trends.There are many specialized techniques designed to make particular kinds ofvisualization easy.

Data Warehouse - A database built to support information access. Typically a datawarehouse is fed from one or more transaction databases. The data needs to becleaned and restructured to support queries, summaries, and analyses.

Decision Support - Data access targeted to provide the information needed bybusiness decision makers. Examples include pricing, purchasing, human resources,management, manufacturing, etc.

Page 17: Data Warehousing PPT

Healthia Consulting 17

Terms and Definitions - continued

Decision Support System (DSS) - Database(s), warehouse(s), and/or mart(s) inconjunction with reporting and analysis software optimized to support timely businessdecision making.

Meta Data - Literally, "data about data." More usefully, descriptions of what kind ofinformation is stored where, how it is encoded, how it is related to other information,where it comes from, and how it is related to your business. A hot topic right now isstandardizing meta data across products from different vendors.

Methodology - The steps followed to guarantee repeatability of success. A goodmethodology is built on top of real world experience. For example, see The Hughes-Vollum Methodology.

Page 18: Data Warehousing PPT

Healthia Consulting 18

Terms and Definitions - continued

Middleware - Hardware and software used to connect clients and servers, to moveand structure data, and/or to pre-summarize data for use by queries and reports.

Multidimensional database (MDD) - A DBMS optimized to support multidimensionaldata. The best systems support standard RDBMS functionality and add high-bandwidth support for multidimensional data and queries. Users that need a lot ofslices and dices might appreciate a multidimensional database.

Page 19: Data Warehousing PPT

Healthia Consulting 19

Terms and Definitions - continued

Object Oriented Analysis (OOA) - A process of abstracting a problem by identifyingthe kinds of entities in the problem domain, the is-a relationships between the kinds(kinds are known as classes, is-a relationships as subtype/supertype,subclass/superclass, or less commonly, specialization/generalization), and the has-arelationships between the classes. Also identified for each class are its attributes (e.g.class Person has attribute Hair Color) and its conventional relationships to otherclasses(e.g. class Order has a relationship Customer to class Customer.)

Object Oriented Design (OOD) - A design methodology that uses Object OrientedAnalysis to promote object reusability and interface clarity.

Page 20: Data Warehousing PPT

Healthia Consulting 20

Terms and Definitions - continued

OLAP - An acronym for On Line Analytical Processing. A common use of a datawarehouse that involves real time access and analysis of multidimensional data suchas order information.

Performance - Data, summaries, and analyses need to be delivered in a timelyfashion. Performance is often a key issue with data warehouses: the right answer isn'tworth much if it shows up after the decisions have been made.

Query - A specific atomic request for information from a database.

Page 21: Data Warehousing PPT

Healthia Consulting 21

Terms and Definitions - continued

Relational On-Line Analytic Processing (ROLAP) - OLAP based on conventionalrelational databases rather than specialized multidimensional databases.

Replication - A standard technique in data warehousing. For performance andreliability several independent copies are often created of each data warehouse. Evendata marts can require replication on multiple servers to meet performance andreliability standards.

Replicator - Any of a class of product that supports replication. Often these tools usespecial load and unload database APIs and have scripting languages that supportautomation.

Page 22: Data Warehousing PPT

Healthia Consulting 22

Terms and Definitions - continued

Report - A repeatable, formatted, nonatomic request for information from a database.Usually a report formats and combines several related queries.

Security - The right data for the right person. Note that a business analyst may needaccess to summaries of data s/he should not see. Security systems need to make thiseasy to implement while making sure outsiders or rogue employees do not see datathey should not see.

Snowflake Schema - A layering of Star Schema that scales that technique to handlean entire warehouse.

Page 23: Data Warehousing PPT

Healthia Consulting 23

Terms and Definitions - continued

Star Schema - A standard technique for designing the summary tables of a datawarehouse. "Fact" tables each join to a larger number of independent "dimension“tables. The tables may be partially de-normalized for performance, but most queries

will still need to join in one or more of the star tables.

Page 24: Data Warehousing PPT

Healthia Consulting 24

Healthia Consulting701 Xenia Ave. South, Suite 170

Minneapolis, MN 55416(763) 923-7900

(763) 923-7901 faxwww.healthiaconsulting.com