30
Data Warehousing Design Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Embed Size (px)

Citation preview

Page 1: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Data Warehousing Design

Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan

Ari Cahyono

Page 2: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Learning Objectives

The issues associated with designing a data warehouse database

A technique for designing a data warehouse database called dimensionality modeling

How a dimensionality modeling differs from a an Entity-Relationship (ER) model.

A step-by-step methodology for designing a data warehouse database.

Criteria for assessing the degree of dimensionality provided by a data warehouse.

Page 3: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Designing a Data Warehouse Database

Highly complex. Beginning with answering questions

such as: Which user requirement are most

important and which data should be considered first?

Should the project be scaled down into something more manageable yet at the same time provide an infrastructure capable of ultimately delivering a full-scale enterprise-wide data warehouse?

Common Solution Data marts.

Page 4: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Dimensionality Modeling

A logical design technique that aims to present the data in a standart, intuitive from that allows for high-performance access.

Dimensionality modeling uses the concept of ER modeling with some important restrictions, i.e: Every Dimension Model (DM) is composed of:▪ Fact Table▪ One tabel with a composite primary key

▪ Dimension Table▪ Has a simple (non-composite) primary key that corresponds exactly

to one of the components of the composite key in the fact table.

Star Schema

Page 5: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Star Schema

The star schema exploit the characteristics of factual data such that facts are generated by events that occurred in the past, and are unlikely to change, regardless of how they are analyzed.

Aka star join

“A logical structure that has a fact table containing factual

data in the centre, surrounded by dimension tables containing reference data (which can be

normalized)”

Page 6: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono
Page 7: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Snowflake schema

A variant of the star schema where dimension tables do not contain denormalized data

Starflake schema

A hybrid structure that contains a mixture of star and snowflake schemas

Page 8: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Nine-Step Methodology by Kimball (1996)

1. Choosing the process

2. Choosing the grain

3. Identifying and conforming the dimensions

4. Choosing the facts

5. Storing pre-calculations in the fact table

6. Rounding out the dimension tables

7. Choosing the duration of the database

8. Tracking slowly changing dimensions

9. Deciding the query priorities and the query modes

Page 9: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Comparison of DM and ER Model ER modeling is a technique for identifying

relationships among entities. Goal: to remove redundancy in the data Unefficient for ad-hoc end-user queries. Traditional ER modeling does not support the main

attraction of data warehousing, namely Intuitive and High performance retriaval of data

A single ER model normally decomposes into multiple DMs.

The multiple DMs are then associated through ‘shared’ dimensions tables.

Page 10: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Step 1: Choosing the process

The process (function) refers to the subject of a particular data mart.

Choose the main entities and relationship

Page 11: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Step 2: Chosing the grain

Deciding exactly what a fact table record represents. e.g. ProductSales individual product

sales Only when the grain for the fact

table is chosen can we identify the dimensions of the fact table.

Page 12: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Step 3: Identifying and conforming the dimension

Dimensions set the context for asking questions about the facts in the table.

A well-built set of dimensions makes the data mart understandable and easy to use.

Identify dimensions in sufficient detail to describe things. A poorly presented of incomplete set of dimensions

will reduce the usefulness of a data mart to an enterprise

If any dimensions occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other.

Page 13: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Step 4: Choosing the facts

The grain of the fact table determines which facts can be used in the data mart.

All the facts must be expressed at the level implied by the grain.

Additional facts can be added to a fact table at any time provided they are consistent with the grain of the table.

Page 14: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Step 5: Storing pre-calculation in the fact table

Add derivative valuable information that can be calculated from the other facts.

Page 15: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Step 6: Rounding out the dimension tables

Add as many text descriptions to the dimensions as possible.

The text descriptions should be as intuitive and understandable to users as possible.

The usefulness of a data mart is determined by the scope and nature of the attributes of the dimension table.

Page 16: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono
Page 17: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Step 7: Choosing the duration of the database

The duration measures how far back in time the fact table goes.

Page 18: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Step 8: Tracking slowly changing dimensions

Three types of SCD: 1. Where a changed dimension attribute is overwritten. 2. where a changed dimension attribute causes a new

dimension record to be created

3. where a changed dimension attribute causes an alternate attribute to be created so that both the old and new values of attribute are simulataneously accesible in the same dimension record

.

Page 19: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Step 9: Deciding the query priorities and the query modes

Consider physical design issues. Physical sort order of fact table on disk

and the presence of pre-stored summaries or aggregations.

Addministration, backup, indexing performance, and security.

Page 20: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Inside Fact Tables

Fact tables is where we keep the measurements. We may keep the details at the lowest

possible level.▪ In the department store fact table for sales

analysis, we may keep the units sold by individual transactions at the cashier’s checkout. ▪ Some fact tables may just contain summary

data called aggregate fact tables.

Page 21: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Fact Tables’ Characteristics

Concatenated Fact Table Key

Grain or level of data Identified Data grain is the level of detail for the measurements or

metrics Fully additive measures Semi-additive measures Large number of records

Table Deep, Not Wide Only a few attributes Sparsity of data Degenerate dimensions

A Denegenerate dimension doesn’t have a dimension key

Page 22: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Degenerate Dimensions

Look closely at attributes of order_number and order_line. These are not measures or metrics or

facts Attributes that are neither fatcs nor

strictly dimension attributes. E.g, reference number like order numbers, invoice numbers, order line numbers.

Example usage: looking for average number of products per order.

Page 23: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Factless Fact Tables

Fact tables that really do not need to contain fatcs. They are “factless” fact tables.

e.g. analyzing student attendance:

Page 24: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Moving a rapidly changing dimension attribute to the fact table as a degenerate dimension column

Page 25: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Date Dimensions

Page 26: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Aggregate Facts Tables

Page 27: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Aggregating Fact Tables

Dimensions Hierarchies

Page 28: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Forming Aggregate Fact Tables

Page 29: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Hierarchies of the store, customer, and product dimensions

Page 30: Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono

Example: Inpatient Service