The Data Warehouse and Technology - Building the Data Warehouse

Embed Size (px)

Citation preview

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    1/43

    Building Data WareHouse by

    Inmon

    Chapter 5: The Data Warehouse and Technology

    http://it-slideshares.blogspot.com/

    http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/http://it-slideshares.blogspot.com/
  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    2/43

    5.0 Overview

    Requires a simpler set oftechnological features than itsoperational predecessors:

    Online updating: Not need. Locking, integrity: needs are minimal.

    Teleprocessing interface: is required verybasic.

    This chapter outlines some oftechnological requirements for thedata warehouse.

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    3/43

    MANAGING LARGEAMOUNTS OF DATA

    1. Manage Volumes

    2. Manage multiple

    media technology

    3. Index and

    monitoring data

    4. Interface to

    retrieve and

    passing data

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    4/43

    Managing Multiple Media

    Following is a hierarchy of storage of data interms of speed of access and cost of storage: Main memory Very fast Very

    expensive

    Expanded memory Very fast Expensive

    Cache Very fast Expensive

    DASD Fast Moderate

    Magnetic tape Not fast Notexpensive

    Near line Not fast* Notexpensive

    Optical disk Not slow Notexpensive

    Fiche Slow Cheap

    *Not fast to find first record sought; very fast to find all other records in the block.

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    5/43

    Indexing and Monitoring Data

    Monitoring data warehouse datadetermines such factors as the

    following:

    If a reorganization needs to be done If an index is poorly structured

    If too much or not enough data is in

    overflow The statistical composition of the access

    of the data

    Available remaining space

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    6/43

    Interfaces to Many

    TechnologiesThe interface to different technologies requires

    several considerations: Does the data pass from one DBMS to another

    easily?

    Does it pass from one operating system toanother easily?

    Does it change its basic format in passage(EBCDIC, ASCII, and so forth)?

    Can passage into multidimensional processingbe done easily?

    Can selected increments of data, such aschanged data capture (CDC) be passed ratherthan entire tables?

    Is the context of data lost in translation as data ismoved to other environments?

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    7/43

    PROGRAMMER ORDESIGNER CONTROL OFDATA PLACEMENT

    Place data at

    block/page level

    Manage data in parallel

    Solid Meta Data controlRich Language

    Interface

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    8/43

    Parallel Storage and

    Management of DataMetadata Management

    Data warehouse table structures

    Data warehouse table attribution

    Data warehouse source data (the system of

    record)Mapping from the system of record to the

    data warehouse

    Data model specification

    Extract loggingCommon routines for access of data

    Definitions and/or descriptions of data

    Relationships of one unit of data to another

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    9/43

    Language Interface

    Typically, the language interface to thedata warehouse should do the

    following:

    Be able to access data a set at a time Be able to access data a record at a time

    Specifically ensure that one or more

    indexes will be used in the satisfaction ofa query

    Have an SQL interface

    Be able to insert, delete, or update data

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    10/43

    EFFICIENT LOADING OFDATA

    Load efficiently

    Use indexes

    efficiently

    Store data incompact way

    Support compound

    Keys

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    11/43

    Efficient Index Utilization

    Technology can support efficient index

    access in several ways:

    Using bit maps Having multileveled indexes

    Storing all or parts of an index in main memory

    Compacting the index entries when the order ofthe data being indexed allows such compaction

    Creating selective indexes and range indexes

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    12/43

    Compaction of Data

    Manage large amounts of data.

    Programmer gets the most out of a

    given I/O when data is stored

    compactly

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    13/43

    Compound Keys

    The time valiancy of data warehousedata.

    Key-foreign key relationships are quite

    common in the atomic data

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    14/43

    VARIABLE-LENGTH DATAVariable-length data efficientlyLock Manager, explicit control at programmer LevelAble Index Only processingRestore data in Bulk efficiently

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    15/43

    Lock Management

    Ensures that two or more people arenot updating the same record at the

    same time.

    Turn the lock manager off and on isnecessary.

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    16/43

    Index-Only Processing

    Looking in an index (or indexes)without going to the primary source of

    data

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    17/43

    Fast Restore

    The capability to quickly restore a datawarehouse table from non-DASD

    storage

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    18/43

    Other Technological Features

    Some of those features include thefollowing:

    Transaction integrity

    High-speed buffering Row- or page-level locking

    Referential integrity

    VIEWs of data Partial block loadin

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    19/43

    DBMS Types and the Data

    WarehouseData warehouses manage massive amounts of

    data because: Granular, atomic detail

    Historical information

    Summary as well as detailed data

    Because record level, transaction-based updatesare a regular feature of the general-purposeDBMS, must offer facilities: Locking

    COMMITs

    Checkpoints

    Log tape processing

    Deadlock

    Backout

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    20/43

    Changing DBMS Technology

    Such a change may be in order for severalreasons: DBMS technologies may be available.

    The size of the warehouse has grown.

    Use of the warehouse has escalated andchanged.

    The basic DBMS decision must be revisited fromtime to time.

    Should the decision be made to go to a new

    DBMS technology, what are theconsiderations? Will the new DBMS technology meet the

    foreseeable requirements?

    How will the conversion from the older DBMS

    technology to the newer DBMS technology bedone?

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    21/43

    Multidimensional DBMS and the

    Data Warehouse

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    22/43

    Multidimensional DBMS and the

    Data Warehouse cont

    The multidimensional DBMS The data warehouse

    1. holds at least an order ofmagnitude less data.

    2. is geared for very heavy andunpredictable access andanalysis of data.

    3. holds a much shorter timehorizon of data.

    4. allows unfettered access.

    5. enjoy a complementary

    relationship.

    1. holds massive amounts ofdata

    2. is geared for a limited amountof flexible access

    3. contains data with a very

    lengthy time horizon (from 5to 10 years)

    4. allows analysts to access itsdata in a constrained fashion

    5. being housed in amultidimensional DBMS

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    23/43

    Multidimensional DBMS and the

    Data Warehouse cont

    Following is the relational foundation formultidimensional DBMS data marts:

    Strengths:Can support a lot of data.

    Can support dynamic joining of data.Has proven technology.

    Is capable of supporting general-purposeupdate processing.

    If there is no known pattern of usage of data,then the relational structure is as good asany other.

    Weaknesses:Has performance that is less than optimal.

    Cannot be purely optimized for access

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    24/43

    Multidimensional DBMS and the

    Data Warehouse contFollowing is the cube foundation for

    multidimensional DBMS data marts:

    Strengths: Performance that is optimal for DSS processing.

    Can be optimized for very fast access of data.

    If pattern of access of data is known, then thestructure of data can be optimized.

    Can easily be sliced and diced.

    Can be examined in many ways.

    Weaknesses: Cannot handle nearly as much data as a standard

    relational format.

    Does not support general-purpose updateprocessing.

    May take a long time to load.

    If access is desired on a path not supported by the

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    25/43

    Multidimensional DBMS and the

    Data Warehouse cont

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    26/43

    Multidimensional DBMS and the

    Data Warehouse cont

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    27/43

    MULTIDIMENSIONAL DBMSAND THE DATA WAREHOUSECONT

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    28/43

    Data Warehousing across

    Multiple Storage MediaA large amount of data is spread

    across more than one storage

    medium.

    One processing environment is the DASDenvironment where online, interactive

    processing is done.

    The other processing environment is often

    a tape or mass store environment

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    29/43

    The Role of Metadata in the Data

    Warehouse Environment

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    30/43

    The Role of Metadata in the Data

    Warehouse Environment

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    31/43

    The Role of Metadata in the Data

    Warehouse Environment

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    32/43

    Context and Content

    The context of the reports is explainedfor the contents

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    33/43

    Three Types of Contextual

    Information Three levels of contextual information must bemanaged:

    Simple contextual information

    Complex contextual information

    External contextual information

    Simple contextual information relates to thebasic structure of data itself, and includessuch things as these: The structure of data

    The encoding of data The naming conventions used for data

    The metrics describing the data, such as: How much data there is

    How fast the data is growing

    What sectors of the data are growing

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    34/43

    Three Types of Contextual

    Information cont This type of information addresses such

    aspects of data as these:

    Product definitions

    Marketing territories Pricing

    Packaging

    Organization structure

    Distribution

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    35/43

    Three Types of Contextual

    Information cont Some examples of external contextual

    information include the following:

    Economic forecasts: Inflation

    Financial trends

    Taxation

    Economic growth

    Political information

    Competitive information

    Technological advancements

    Consumer demographic movements

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    36/43

    Capturing and Managing

    Contextual Information Complex and external contextual

    types of information are hard to

    capture and quantify because they are

    so unstructured.

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    37/43

    Looking at the Past

    Some of these shortcomings are asfollows:

    The information management

    attempts were aimed at theinformation systems developer, not the

    end user.

    Attempts at contextual managementwere passive.

    Attempts at contextual information

    management were in many cases

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    38/43

    Refreshing the Data

    WarehouseReading a log tape is no small matter,however. Many obstacles are in the

    way, including the following:

    The log tape contains muchextraneous data.

    The log tape format is often arcane.

    The log tape contains spannedrecords.

    The log tape often contains addresses

    instead of data values.

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    39/43

    Testing

    It is very unusual to find a similar testenvironment in the world of the data

    warehouse, for the following reasons:

    Data warehouses are so large that acorporation has a hard time justifying

    one of them, much less two of them.

    The nature of the development lifecycle for the data warehouse is

    iterative.

    For the most part, programs are run in

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    40/43

    Summary

    Manage large amounts of data

    Manage data on a diversemedia

    Easily index and monitor data

    Interface with a wide number

    of technologies Allow the programmer to place

    the data directly on thephysical device

    Store and access data inparallel

    Have metadata control of thewarehouse

    Efficiently load the warehouse

    Efficiently use indexes

    Store data in a compact way

    Support compound keys Selectively turn off the lock

    manager

    Do index-only processing

    Quickly restore from bulkstorage

    Some technological features arerequired: Robust language interface

    Compound keys

    Variable-length data

    The abilities to do the following:

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    41/43

    Summary cont

    The data architect must recognize thedifferences between a transaction-

    based DBMS and a data warehouse-

    based DBMS.

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    42/43

    Summary cont

    Multidimensional OLAP technology is suitedfor data mart processing and not datawarehouse processing.

    When the data mart approach is used, manyproblems become evident: The number of extract programs grows large.

    Each new multidimensional database must returnto the legacy operational environment for its own

    data. There is no basis for reconciliation of differences

    in analysis.

    A tremendous amount of redundant data amongdifferent multidimensional DBMS environments

    exists.

  • 7/29/2019 The Data Warehouse and Technology - Building the Data Warehouse

    43/43

    Summary cont

    Metadata in the data warehouseenvironment plays a very different role

    than metadata in the operational

    legacy environment.