Data Warehousing - 8 ETL.pdfUsed to run and monitor the DataStage jobs, although this can also be done in Designer. ... Runs Job Sequences . DataStage Designer Job log Menus / toolbar 09

  • View
    215

  • Download
    1

Embed Size (px)

Transcript

  • Data Warehousing ETL

  • Outline

    2

    The ETL Process

    General ETL issues

    Building dimensions

    Building fact tables

    Extract

    Transformations/cleansing

    Load

    IBM InfoSphere DataStage

  • 3

  • ETL

    4

    When should we ETL ?

    Periodically (e.g., every night, every week) or after significant events

    Refresh policy set by administrator based on user needs and traffic

    Possibly different policies for different sources

    ETL is used to integrate heterogeneous systems

    With different DBMS, operating system, hardware, communication protocols

    ETL challenges

    Getting the data from the source to target as fast as possible

    Allow recovery from failure without restarting the whole process

  • 5

  • 6

  • 7

  • 8

  • 9

  • 10

  • 11

  • 12

  • 13

  • 14

  • 15

  • Data Integration

    16

  • Schema Integration

    17

  • Schema conflicts

    18

  • Schema Integration

    19

  • Schema Integration

    20

  • 21

  • IBM InfoSphere DataStage

    22

  • 23

  • 24

  • data extractions (reads), data flows, data combinations, data

    transformations, data constraints, data aggregations, and data loads

    (writes) 25

  • 26

  • 27

  • 28