13
The Data Warehouse and Design

The Data Warehouse and Design. Summary The design of the data warehouse begins with the data model The primary concern of the data warehouse developer

  • View
    227

  • Download
    4

Embed Size (px)

Citation preview

The Data Warehouse and Design

Summary

• The design of the data warehouse begins with the data model

• The primary concern of the data warehouse developer is managing volume

• The data warehouse is fed data as it passes from the legacy operational environment. Data goes through a complex process of conversion, reformatting, and integration as it passes from the legacy operational environment into the data warehouse environment

• The data model exist at three levels – high level, mid level, and low level

• The creation of a data warehouse record is triggered by an activity or an event that has occurred in the operational environment

• A profile record is a composite record made up of many different historical activities.

• The star join is a database design technique that is sometimes mistakenly applied to the data warehouse environment

Beginning with Operational Data

• Three types of loads are made into the data warehouse from the operational environment:– Archival data

– Data currently contained in the operational environment

– Ongoing changes to the data warehouse environment from the changes (updates)that have occurred in the operational environment since the last refresh

Beginning with Operational Data (cont’d)

• Five common techniques are used to limit the amount of operational data scanned

1. Scan data that has been timestamped

2. Scan a ‘delta’ file

3. Scan a log file or an audit file

4. Modify application code

5. Rubbing a ‘before’ and an ‘after’ image of the operational file together

Data/Process Model and the Architected Environment

• The process model applies only to the operational environment• The data model applies to both the operational environment and

the data warehouse environment• A process model typically consists of the following (in whole or

in part)– Functional decomposition– Context-level zero diagram– Data Flow Diagram– Structure Chart– State Transition Diagram– HIPO chart– Pseudocode

The Data Warehouse and Data Models

The Data Warehouse data model

• There are three levels of data modeling– High-level modeling (ERD)– Middle level modelling (DIS=Data Item Set)– Low-level modeling (physical model)

Snapshots in the Data Warehouse

• Snapshots are created as a result of some event occuring.

• The snapshot triggered by an event has four basic components:– A key– A unit of time– Primary data that relates only to the key– Secondary data captured as part of the snapshot process

that has no direct relationship to the primary data or key

Complexity of Transformation and Integration

• At first glance, when data is moved from the legacy environment to the data warehouse environment, it appears that nothing more is going on than simple extraction of data from one place to the next

Complexity of Transformation and Integration (cont’d)

• Some lists of functionality required as data passes from the operational, legacy environment to the data warehouse environment– The extraction of data from operational environment to the data warehouse

environment require a change in technology (DBMS technology)– The selection data may be very complex– Operational input keys need to be restructured and converted– Nonkey data is reformatted– Data is cleansed– Multiple input sources of data exist and must be merged– Key resolution must be done– Input files need resequencedd– Default values must be supplied, – Many etc…

Profile records

• Profile records represent snapshots of data, just like individual activity records

• A profile record is created from the grouping of many detailed records