DS Preparation

Embed Size (px)

Citation preview

  • 8/12/2019 DS Preparation

    1/4

    Parallel Processing :

    There are two types of parallelism techniques in DS Px. They are

    1. Pipeline Parallelism

    2. Partitioning Parallelism

    1.Pipeline Parallelism:

    All the stages in the job will run concurrently. o stage will be i!le.

    " will explain with one example. #et$s ta%e a job which will loa! the !ata from the &racle Source to

    the oracle target with a transformer in between. '(en in a single)no!e

    *onfiguration+ instea! of waiting for all source !ata to be rea!+ as soon as the source

    !ata is present at the input+ these are passe! to the subsequent

    stages. This metho! is calle! pipeline parallelism.

    "f you ran the same job on a system with multiple processors+ the stage

    rea!ing woul! start on one processor an! start filling a pipeline with the !ata itha! rea!. The transformer stage woul! start running as soon as there was

    !ata in the pipeline process it an! start filling another pipeline. The stage

    writing the transforme! !ata to the target woul! similarly start

    writing as soon as there was !ata a(ailable. Thus all three stages are

    operating simultaneously.

    2.Partition Parallelism:

    "n Partition parallelism+ the !ata is partitione! into a number of

    separate sets+ with each partition being han!le! by a separate instance of the

    job stages.

    ,sing partition parallelism the job woul! effecti(ely be run simultaneously byse(eral processors+ each han!ling a separate subset of the total !ata. At the

    en! of the job the !ata partitions can be collecte! bac% together again an!

    written to a single !ata source.

    Configuration file:

    The configuration file !escribes a(ailable processing power in terms

    of processing no!es. The number of no!es you !efine in the configuration

    file !etermines how many instances of a process will be pro!uce!

    when you compile a parallel job.

    -hen you run a DataStage job+DataStage first rea!s the configuration file to !etermine the a(ailable

    system resources.

    -hen you mo!ify your system by a!!ing or remo(ing processing

    no!es or by reconfiguring no!es+ you !o not nee! to alter or e(en

    recompile your DataStage job. ust e!it the configuration file.

    /ou can create as many config files as many u want but at once only one can be use!. The config file

    can create! from the manager. The config file has the structure

    0

    no!e no!e1 0

    fastname ser(ername

    pool no!e1 ser(ername

  • 8/12/2019 DS Preparation

    2/4

    resource !is% path

    resource scratch!is% path

    3

    3

    Partition Types:

    The !ata is partitione! to get the goo! performance. The !ata is partitione! an! %ept in separate sets

    an! this will be put on !ifferent processors a(ailable.

    There are 4 partition metho!s

    They are

    1. Auto

    2. Same5. 'ntire

    6. 7oun! 7obin

    8. 7an!om

    9. :o!ulus

    ;. 2

    4. 7ange

    1. Auto:

    This is the !efault partition present in the DS. This will select the best partition metho! !epen!ing on

    the type of stage. Typically DS uses roun! robin partitioning metho!.

    2. Same:

    DS ta%es the partition metho! that is use! by the pre(ious stage. >ecause of this the recor!s staye! in

    some no!e will not be re!istribute! an! the partitioning will be faster. Same is the fastest partitioning

    metho!.This can be use! when passing the !ata between the stages.

    3. Entire:

    "n this the total !ata set will be present in each processing no!e. '(ery no!e will be ha(ing the access

    to the total !ata set. This is generally use! for the loo%ups.

    4. Round Roin:

    "n this metho!+ the !ata will be portione! on the a(ailable no!es. The first recor! goes to the first

    no!e+ the secon! recor! goes to the secon! no!e an! so on. After the last no!e gets the recor! the

    process again starts. "n this we will get the equal sharing of !ata between the no!es.

    !. Random:

    "n ran!om metho! the recor!s are !istribute! ran!omly base! on the ran!om generate! (alue. "n this

    metho! also we will get the equal share! !ata on all the no!es. >ut this will require time to calculate

    the ran!om number

    ". #odulus:

  • 8/12/2019 DS Preparation

    3/4

    "n this+ the partitioning is !one base! on the %ey column mo! by the no. of no!es. >ase! on the result

    it is !eci!e! to mo(e which recor! to which no!e.

    $. %as&:

    "n hash partitioning+ the !ata is partitioine! base! on the some %ey column combination. The %ey

    (alues are ran!omly !istribute! among the a(ailable no!es. "n this all the partitions will not be of

    equal si?e. The recor!s with same partitioning are put in the same no!e.

    '. ()2:

    "n this+ the D>2 partitioning metho! is use! to partition metho!.

    *. Range:

    "n this+ base! on some range the !ata is partitione! into the sets an! each is assigne! to the a(ailable

    no!es. "n this we will get equal si?e partitions.

    Collecting met&ods:

    The collecting metho!s are use! to collect the !ata which is partitione! into !ifferent sets.

    The !ata is collecte! using the collecting metho!s an! the !ata is mo(e! to the target stage.

    There are 6 types of collecting metho!.

    They are

    1. 7oun! robin

    2. Auto5. &r!ere!

    6. Sorte! :erge

    1. Round roin:

    "n this metho!+ first recor! will be rea! from the first partition an! secon! recor! from secon!

    partition an! so on.

    2. Auto:

    "n this type the DS will choose the best collecting metho! base! on the stage an! will use that to

    collect the !ata. Auto is the fastest collection metho! in DS.

    3. +rdered:

    "n this it will rea! all the recor!s from the first partition an! then secon! an! so on until it reaches the

    last partition.

    4. Sorted #erge:

    7ea! recor!s in an or!er base! on one or more columns of the recor!.

    The columns use! to !efine recor! or!er are calle! collecting %eys.

    Typically+ you use the sorte! merge collector with a partition)sorte!

    !ata set @as create! by a sort stage. "n this case+ you specify as the

    collecting %ey fiel!s those fiel!s you specifie! as sorting %ey fiel!s to

    the sort stage.

  • 8/12/2019 DS Preparation

    4/4