Ab-Initio Interview Ques

  • Upload
    hem777

  • View
    263

  • Download
    2

Embed Size (px)

Citation preview

  • 8/20/2019 Ab-Initio Interview Ques

    1/39

    What is the relation between EME , GDE

    and Co-operating system ?ans. EME is said as enterprise metdata env, GDE as graphical devlopment env and Co-operating sytem can be said as asbinitio server

    relation b/w this CO-OP, EME AND GDE is as fallows

    Co operating system is the Abinitio Server. this co-op is installed on perticular O.S platform that is called NATIVE O.S .comming to the EME,

    its i just as repository in informatica , its hold the metadata,trnsformations,db config files source and targets informations. comming to GDE its

    is end user envirinment where we can devlop the graphs(mapping just like in informatica)

    desinger uses the GDE and designs the graphs and save to the EME or Sand box it is at user side.where EME is ast server side.

    What is the use of aggregation when we

    have rollupas we know rollup component in abinitio is used to summirize group of data record. then where we will use aggregation ?ans: Aggregation and Rollup both can summerise the data but rollup is much more convenient to use. In order to understand how a particular

    summerisation being rollup is much more explanatory compared to aggregate. Rollup can do some other functionalities like input and output

    filtering of records.

    Aggregate and rollup perform same action, rollup display intermediat

    result in main memory, Aggregate does not support intermediat result

    what are kinds of layouts does ab initio supports

    Basically there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends on

    the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is

    defined such as it’s same as the degree of parallelism.

    How can you run a graph infinitely?To run a graph infinitely, the end script in the graph should call the .ksh file of the graph. Thus if the name of the graph is abc.mp then in the

    end script of the graph there should be a call to abc.ksh.

    Like this the graph will run infinitely.

    How do you add default rules in

    transformer?Double click on the transform parameter of parameter tab page of component properties, it will open transform editor. In the transform editorclick on the Edit menu and then select Add Default Rules from the dropdown. It will show two options – 1) Match Names 2) Wildcard.

    Do you know what a local lookup is?If your lookup file is a multifile and partioned/sorted on a particular key then local lookup function can be used ahead of lookup function call.

    This is local to a particular partition depending on the key.

  • 8/20/2019 Ab-Initio Interview Ques

    2/39

    Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much

    faster than retirving from disk. It allows the transform component to process the data records of multiple files fastly.

    What is the difference between look-up file

    and look-up, with a relevant example?Generally Lookup file represents one or more serial files(Flat files). The amount of data is small enough to be held in the memory. This allows

    transform functions to retrive records much more quickly than it could retrive from Disk.

    A lookup is a component of abinitio graph where we can store data and retrieve it by using a key parameter.

    A lookup file is the physical file where the data for the lookup is stored.

    How many components in your most complicated graph? It depends the type of components you us.

    usually avoid using much complicated transform function in a graph.

    Explain what is lookup?Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file

    (serial/multi file). The dataset can be static as well dynamic ( in case the lookup file is being generated in previous phase and used as lookup

    file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the input to the join contains less

    number of records with slim record length.

    AbInitio has built-in functions to retrieve values using the key for the lookup

    What is a ramp limit?

    The limit parameter contains an integer that represents a number of reject events

    The ramp parameter contains a real number that represents a rate of reject events in the number of records processed.

    no of bad records allowed = limit + no of records*ramp.

    ramp is basically the percentage value (from 0 to 1)

    This two together provides the threshold value of bad records.

    Have you worked with packages?Multistage transform components by default uses packages. However user can create his own set of functions in a transfer function and can

    include this in other transfer functions.

    Have you used rollup component? Describe

    how.If the user wants to group the records on particular field values then rollup is best way to do that. Rollup is a multi-stage transform functionand it contains the following mandatory functions.

    1. initialise

    2. rollup

    3. finalise

    Also need to declare one temporary variable if you want to get counts of a particular group.

  • 8/20/2019 Ab-Initio Interview Ques

    3/39

    For each of the group, first it does call the initialise function once, followed by rollup function calls for each of the records in the group and

    finally calls the finalise function once at the end of last rollup call.

    How do you add default rules in

    transformer?Add Default Rules — Opens the Add Default Rules dialog. Select one of the following: Match Names — Match names: generates a set of

    rules that copies input fields to output fields with the same name. Use Wildcard (.*) Rule — Generates one rule that copies input fields to

    output fields with the same name.

    )If it is not already displayed, display the Transform Editor Grid.

    2)Click the Business Rules tab if it is not already displayed.

    3)Select Edit > Add Default Rules.

    In case of reformat if the destination field names are same or subset of the source fields then no need to write anything in the reformat xfr

    unless you dont want to use any real transform other than reducing the set of fields or split the flow into a number of flows to achive thefunctionality.

    What is the difference between partitioning

    with key and round robin?Partition by Key or hash partition -> This is a partitioning technique which is used to partition data when the keys are diverse. If the key is

    present in large volume then there can large data skew. But this method is used more often for parallel data processing.

    Round robin partition is another partitioning technique to uniformly distribute the data on each of the destination data partitions. The skew is

    zero in this case when no of records is divisible by number of partitions. A real life example is how a pack of 52 cards is distributed among 4

    players in a round-robin manner.

    How do you improve the performance of a

    graph?There are many ways the performance of the graph can be improved.

    1) Use a limited number of components in a particular phase

    2) Use optimum value of max core values for sort and join components

    3) Minimise the number of sort components

    4) Minimise sorted join component and if possible replace them by in-memory join/hash join

    5) Use only required fields in the sort, reformat, join components

    6) Use phasing/flow buffers in case of merge, sorted joins

    7) If the two inputs are huge then use sorted join, otherwise use hash join with proper driving port

    8) For large dataset don’t use broadcast as partitioner

    9) Minimise the use of regular expression functions like re_index in the trasfer functions

    10) Avoid repartitioning of data unnecessarily

  • 8/20/2019 Ab-Initio Interview Ques

    4/39

    Try to run the graph as long as possible in MFS. For these input files should be partitioned and if possible output file should also be

    partitioned.

    How do you truncate a table?

    From Abinitio run sql component using the DDL “trucate table

    By using the Truncate table component in Ab Initio

    Have you eveer encountered an error called

    “depth not equal”?When two components are linked together if their layout doesnot match then this problem can occur during the compilation of the graph. A

    solution to this problem would be to use a partitioning component in between if there was change in layout.

    What is the function you would use totransfer a string into a decimal?In this case no specific function is required if the size of the string and decimal is same. Just use decimal cast with the size in the transform

    function and will suffice. For example, if the source field is defined as string(8) and the destination as decimal(8) then (say the field name is

    field1).

    out.field :: (decimal(8)) in.field

    If the destination field size is lesser than the input then use of string_substring function can be used likie the following.

    say destination field is decimal(5).

    out.field :: (decimal(5))string_lrtrim(string_substring(in.field,1,5)) /* string_lrtrim used to trim leading and trailing spaces */

    What are primary keys and foreign keys?

    In RDBMS the relationship between the two tables is represented as Primary key and foreign key relationship.Wheras the primary key table is

    the parent table and foreignkey table is the child table.The criteria for both the tables is there should be a matching column.

    What is the diference between clustered and non-clustered indices? …and why do you use a clustered index?

    What is an outer join?

    An outer join is used when one wants to select all the records rom a port – whether it has satised the join

    criteria or not.

    What are artesian joins?

     joins two tables without a join key. Key should be {}.

    What is the purpose of having stored procedures in a database?

    Main Purpose of Stored Procedure for reduse the network trafic and all sql statement executing in cursor so speed too high.

    Why might you create a stored procedure with the ‘with recompile’ option?

  • 8/20/2019 Ab-Initio Interview Ques

    5/39

    Recompile is useful when the tables referenced by the stored proc undergoes a lot of modification/deletion/addition of data. Due to the heavy

    modification activity the execute plan becomes outdated and hence the stored proc performance goes down. If we create the stored proc with

    recompile option, the sql server wont cache a plan for this stored proc and it will be recompiled every time it is run.

    What is a cursor? Within a cursor, howwould you update fields on the row just

    fetchedThe oracle engine uses work areas for internal processing in order to the execute sql statement is called cursor.There are two types of

    cursors like Implecit cursor and Explicit cursor.Implicit cursor is using for internal processing and Explicit cursor is using for user open for data

    required.

    How would you find out whether a SQLquery is using the indices you expect?explain plan can be reviewed to check the execution plan of the query. This would guide if the expected indexes are used or not.

    How can you force the optimizer to use a

    particular index?use hints /*+ */, these acts as directives to the optimizer

    select /*+ index(a index_name) full(b) */ *from table1 a, table2 bwhere b.col1 = a.col1 and b.col2= ‘sid’and b.col3 = 1;

    When using multiple DML statements to perform a single unit of work, is it preferable to use implicit or explicit transactions, and why.

    Because implicit is using for internal processing and explicit is using for user open data requied.

    Describe the elements you would review to ensure multiple scheduled “batch” jobs do not “collide” with each other.

    Because every job depend upon another job for example if you first job result is successfull then another job will execute otherwise your job

    doesn’t work.

    Describe the process steps you would

    perform when defragmenting a data table.This table contains mission critical data.

    There are several ways to do this:

    1) We can move the table in the same or other tablespace and rebuild all the indexes on the table.

  • 8/20/2019 Ab-Initio Interview Ques

    6/39

    alter table move this activity reclaims the defragmented space in the table

    analyze table table_name compute statistics to capture the updated statistics.

    2)Reorg could be done by taking a dump of the table, truncate the table and import the dump back into the table.

    Explain the difference between the“truncate” and “delete” commands.The difference between the TRUNCATE and DELETE statement is Truncate belongs to DDL command whereas DELETE belongs to DML

    command.Rollback cannot be performed incase of Truncate statement wheras Rollback can be performed in Delete statement. “WHERE”

    clause cannot be used in Truncate where as “WHERE” clause can be used in DELETE statement.

    What is the difference between a DB config

    and a CFG file?A .dbc file has the information required for Ab Initio to connect to the database to extract or load tables or views. While .CFG file is the tableconfiguration file created by db_config while using components like Load DB Table.

    Describe the “Grant/Revoke” DDL facility

    and how it is implemented.Basically,This is a part of D.B.A responsibilities GRANT means permissions for example GRANT CREATE TABLE ,CREATE VIEW AND

    MANY MORE .

    REVOKE means cancel the grant (permissions).So,Grant or Revoke both commands depend upon D.B.A.

    !a"e you wor#ed with pac#a$es?

    Ans% Multistage transform components by default uses packages. However user can create his own set of functions in a transfer function

    and can include this in other transfer functions.

    !a"e you used rollup component? &escribe how.

    Ans% If the user wants to group the records on particular field values then rollup is best way to do that. Rollup is a multi-stage transform

    function and it contains the following mandatory functions.

    1. initialise

    2. rollup

    3. finalise

    Also need to declare one temporary variable if you want to get counts of a particular group.

    For each of the group, first it does call the initialise function once, followed by rollup function calls for each of the records in the group and

    finally calls the finalise function once at the end of last rollup call.

    !ow do you add deault rules in transormer?

    Ans% Add Default Rules — Opens the Add Default Rules dialog. Select one of the following: Match Names — Match names: generates a set

    of rules that copies input fields to output fields with the same name. Use Wildcard (.*) Rule — Generates one rule that copies input fields to

    output fields with the same name.

    1)If it is not already displayed, display the Transform Editor Grid.

    2)Click the Business Rules tab if it is not already displayed.

    3)Select Edit > Add Default Rules.

  • 8/20/2019 Ab-Initio Interview Ques

    7/39

    In case of reformat if the destination field names are same or subset of the source fields then no need to write anything in the reformat xfr

    unless you dont want to use any real transform other than reducing the set of fields or split the flow into a number of flows to achive the

    functionality.

    What is the diference between partitionin$ with #ey and round robin?

    Ans% Partition by Key or hash partition -> This is a partitioning technique which is used to partition data when the keys are diverse. If the key

    is present in large volume then there can large data skew. But this method is used more often for parallel data processing.Round robin partition is another partitioning technique to uniformly distribute the data on each of the destination data partitions. The skew is

    zero in this case when no of records is divisible by number of partitions. A real life example is how a pack of 52 cards is distributed among 4

    players in a round-robin manner.

    !ow do you impro"e the perormance o a $raph?

    Ans% There are many ways the performance of the graph can be improved.

    1) Use a limited number of components in a particular phase

    2) Use optimum value of max core values for sort and join components

    3) Minimise the number of sort components

    4) Minimise sorted join component and if possible replace them by in-memory join/hash join

    5) Use only required fields in the sort, reformat, join components

    6) Use phasing/flow buffers in case of merge, sorted joins

    7) If the two inputs are huge then use sorted join, otherwise use hash join with proper driving port

    8) For large dataset don’t use broadcast as partitioner

    9) Minimise the use of regular expression functions like re_index in the trasfer functions

    10) Avoid repartitioning of data unnecessarily

    Try to run the graph as long as possible in MFS. For these input files should be partitioned and if possible output file should also be

    partitioned.

    !ow do you truncate a table?

    Ans% From Abinitio run sql component using the DDL “trucate table

    By using the Truncate table component in Ab Initio

    What is the relation between '(' ) *&' and o-operatin$ system ?

    Ans % EME is said as enterprise metdata env,

    GDE as graphical devlopment env and Co-operating sytem can be said as asbinitio server relation b/w this CO-OP, EME AND GDE is as

    follows

    o operatin$ system is the Abinitio Server.This co-op is installed on perticular O.S platform that is called NATIVE O.S .comming to the

    EME, its i just as repository in informatica , its hold the metadata,trnsformations,db config files source and targets informations. comming to

    GDE its is end user envirinment where we can devlop the graphs(mapping just like in informatica) desinger uses the GDE and designs thegraphs and save to the EME or Sand box it is at user side where EME is ast server side.

    What is the use o a$$re$ation when we ha"e rollup as we #now rollup component in abinitio is used to

    summiri+e $roup o data record. then where we will use a$$re$ation ?

    Ans% Aggregation and Rollup both can summerise the data but rollup is much more convenient to use. In order to understand how a

    particular summerisation being rollup is much more explanatory compared to aggregate. Rollup can do some other functionalities like input

    and output filtering of records. Aggregate and rollup perform same action, rollup display intermediat

    result in main memory, Aggregate does not support intermediat result.

    What are #inds o layouts does ab initio supports?

    Ans% Basically there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends

    on the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is

    defined such as it’s same as the degree of parallelism.

    !ow can you run a $raph innitely?

    Ans% To run a graph infinitely, the end script in the graph should call the .ksh file of the graph. Thus if the name of the graph is abc.mp then

    in the end script of the graph there should be a call to abc.ksh. Like this the graph will run infinitely.

    !ow do you add deault rules in transormer?

    Ans % Double click on the transform parameter of parameter tab page of component properties, it will open transform editor. In the transform

    editor click on the Edit menu and then select Add Default Rules from the dropdown. It will show two options -, (atch ames /

    Wildcard.

    &o you #now what a local loo#up is?

    Ans % If your lookup file is a multifile and partioned/sorted on a particular key then local lookup function can be used ahead of lookup

    function call. This is local to a particular partition depending on the key.

    Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much

    faster than retirving from disk. It allows the transform component to process the data records of multiple files fastly.

  • 8/20/2019 Ab-Initio Interview Ques

    8/39

    What is the diference between loo#-up le and loo#-up) with a rele"ant example? Ans% Generally Lookup file

    represents one or more serial files(Flat files). The amount of data is small enough to be held in the memory. This allows transform functions

    to retrive records much more quickly than it could retrive from Disk.

    A lookup is a component ofabinitio $raph where we can store data and retrieve it by using a key parameter.

    A lookup file is the physical file where the data for the lookup is stored.

    !ow many components in your most complicated $raph? 0t depends the type o components you us.

    Ans% Usually avoid using much complicated transform function in a graph.

    'xplain what is loo#up?

    Ans% Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file

    (serial/multi file). The dataset can be static as well dynamic ( in case the lookup file is being generated in previous phase and used as lookup

    file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the input to the join contains less

    number of records with slim record length.

    AbInitio has built-in functions to retrieve values using the key for the lookup

    What is a ramp limit?

    Ans% The limit parameter contains an integer that represents a number of reject events

    The ramp parameter contains a real number that represents a rate of reject events in the number of records processed.

    no o bad records allowed = limit + no of records*ramp.

    ramp is basically the percentage value (from 0 to 1)

    This two together provides the threshold value of bad records.

    What is destructor what is destructor

    What is XML-RPC? What is XML-RPC?What is new about Web services? What is new about Web services?

    What is a Web service? What is a Web service?

    What kind of services operating system provides? What kind of services operating system provides?

    What is logic? What is logic?

    What is algorithm? What is algorithm?

    What is constant? What is constant?

    What is variable? What is variable?

    What for an assignment statement is used? What for an assignment statement is used?

    What are the four basic types of data? What are the four basic types of data?

    What for a conditional loop is best suited? What for a conditional loop is best suited?

    What for an incremented loop is best suited? What for an incremented loop is best suited?

    What is Relational operators used for? What is Relational operators used for?

    What Relational Operators Do you know? (C) What Relational Operators Do you know? (C)

    What does grep() stand for? (ni! interview "uestion) What does grep() stand for? (ni! interview "uestion)

    What does R#$ stand for? What does R#$ stand for?

  • 8/20/2019 Ab-Initio Interview Ques

    9/39

    What does R#$ stand for? What does R#$ stand for?

    What does %isp stand for? What does %isp stand for?

    What does &'% stand for? What does &'% stand for?

    What does *ortran stand for? What does *ortran stand for?

    What does DO+ stand for? What does DO+ stand for?

    What does C$, stand for? What does C$, stand for?

    What does COR-. stand for? What does COR-. stand for?

    What does Cobol stand for? What does Cobol stand for?

    What does Case stand for? What does Case stand for?

    What does -.+,C stand for? What does -.+,C stand for?

    What does .+C,, stand for? What does .+C,, stand for?

    What does .lgol stand for? What does .lgol stand for?

    What does +/% stand for? What does +/% stand for?

    What is the latest version that is available in Ab-initio?&ow to take the input data from an e!cel sheet?

    &ow will you test a dbc 0le from command prompt ?

    Which one is faster for processing 0!ed length dmls or delimited dmls and why ?

    What are the contineous components in .binitio?

    What is meant by fancing in abinitio ?

    What is the relation between 11 2 $D1 and Co3operating system ?

    What is the use of aggregation when we have rollup as we know rollup component in abinitio is used to summiri4e group of 

    data record then where we will use aggregation ?

    Describe the process steps you would perform when defragmenting a data table 'his table contains mission critical data

    1!plain the di5erence between the ?truncate? and 6delete7 commands

  • 8/20/2019 Ab-Initio Interview Ques

    10/39

    When running a stored procedure de0nition script how would you guarantee the de0nition could be 6rolled back7 in the

    event of problems

    Describe the ?$rant8Revoke? DD% facility and how it is implemented

    Describe how you would ensure that database ob9ect de0nitions ('ables2 ,ndices2 Constraints2 'riggers2 sers2 %ogins2Connection Options2 and +erver Options etc) are consistent and repeatable between multiple database instances (ie: a

    test and production copy of a database)

    What is the di5erence between a D- con0g and a C*$ 0le?

    What about D% changes dynamically?

    What is backward compatibility in abinitio?

    What are kinds of layouts does ab initio supports

    &ow do you add default rules in transformer?

    &ave you used rollup component? Describe how

    What are primary keys and foreign keys?

    What is an outer 9oin?

    What are Cartesian 9oins?

    What is the purpose of having stored procedures in a database?

    What is a cursor? Within a cursor2 how would you update 0elds on the row 9ust fetched?

    &ow would you 0nd out whether a +/% "uery is using the indices you e!pect?

    &ow can you force the optimi4er to use a particular inde!?

    When using multiple D% statements to perform a single unit of work2 is it preferable to use implicit or e!plicit transactions2

    and why

    Describe the elements you would review to ensure multiple scheduled 6batch7 9obs do not 6collide7 with each other

    What is semi39oin

    &ow to get D% using tilities in ;,

  • 8/20/2019 Ab-Initio Interview Ques

    11/39

    What is local and formal parameter

    What is -RODC.+',;$ and R1#%,C.'1 ?

    Explain what is lookup?

    &ave you worked with packages?

    &ow to create repository in abinitio for stand alone system(%OC.% ;')?

    What is the di5erence between dbc and cfg 0le?

    What does dependency analysis mean in .b ,nitio?

    What do you have to give the value for the Record Re"uired parameter for a natural 9oin?

    When do you use #artition by 1!pression?

    What is .dhoc *ile +ystem? $ive me a scenario where you used it

    What are the di5erent commands that you used when writing wrappers?

    What do the hidden 0les in a sandbo! represent and what does startksh represent?

    &ow can we test the abintio manually and automation?

    What is the di5erence between sandbo! and 112 can we perform checkin and checkout through sandbo!8 Can anybody

    e!plain checkin and checkout?

    What does layout means in terms of .b ,nitio

    What are di5erent things that you have to consider when loading data into a table?

    &ow to Create +urrogate =ey using .b ,nitio?

    Can anyone give me an e!aple of realtime start script in the graph?

    What are di5erences between di5erent $D1 versions(>>2>>>2>>@2>>Aand >>B)? What are di5erences between di5erent

    versions of Co3op?

    Do you know what a local lookup is?

    &ow many components in your most complicated graph?

    &ow to handle if D% changes dynamically in abinitio

  • 8/20/2019 Ab-Initio Interview Ques

    12/39

    1!plain what is lookup?

    &ave you worked with packages?

    &ow to run the graph without $D1?

    What are the di5erent versions and releases of .-initio ($D1 and Co3op version)

    What is the Di5erence between D% 1!pression and

  • 8/20/2019 Ab-Initio Interview Ques

    13/39

    1!plain the di5erences between api and utility mode?

    #lease let me know whether we have ab initio $D1 version >> and what is the latest $D1 version and Co3op version?

    What are the $raph parameter?

    &ow to 0nd the number of arguments de0ned in graph

    What is the di5erence between rollup and scan?

    &ow to work with parameteri4ed graphs?

    #lease give us insight on 1nterprise eta 1nvironment2 and some possible "uestions on that

    What are delta table and master table?

    What error would you get when you use #artition by Round Robin and Foin?

    Do you know what a local lookup is?

    &ow many components in your most complicated graph?

    &ow to handle if D% changes dynamically in abinitio

    How do you count the number of records in a flat file?

    &ow do you connect 11 to .binitio +erver?

    &ave you eveer encountered an error called Gdepth not e"ualH? ('his occurs when you e!tensively create graphs it is a trick

    "uestion)

    What is the di5erence between a D- con0g and a C*$ 0le?

    Do you know what a local lookup is?

    What is the di5erence between look3up 0le and look3up2 with a relevant e!ample?

    &ave you worked with packages?

    ,n which scenarios would you use #artition by =ey and also2 #artition by Round Robin and di5erences between the both?

    What are the di5erent dimension tables that you used and some columns in the fact table?

    What is the di5erence between a +can component and a Rollp component?

    &ow do we handle if D% changing dynamicaly

  • 8/20/2019 Ab-Initio Interview Ques

    14/39

    What is mIdump

    What is the synta! of mIdump command?

    &ave you used rollup component? Describe how

    &ow do you improve the performance of a graph?

    &ow many components are there in your most complicated graph?

    What is the function you would use to transfer a string into a decimal?

    *or data parallelism2 we can use partition components *or component parallelism2 we can use replicate component %ike

    this which component(s) can we use for pipeline parallelism?

    What is .-I%OC.% e!pression where do you use it in ab3initio?

    What is mean by Co J Operating system and why it is special for .b3initio ?

    &ow to retrive data from database to source in that case whice componenet is used for this?

    &ow can you run a graph in0nitely?

    What is the synta! of mIdump command?

    &ow to do we run se"uences of 9obs 22 like output of . FO- is ,nput to - &ow do we co3ordinate the 9obs

    &ow do you truncate a table?

    What is a ramp limit?

    What is the di5erence between dbc and cfg? When do you use these two?

    What are the compilation errors you came across while e!ecuting your graphs?

    What is depthIerror?

    Di5erence between conventional loading and direct loading ? When it is used in real time

    During the e!ecution of graph2 let us say you lost the network connection2 would you have to start the process all over

    again or does it start from where it stopped?

    What are the di5erent types of partitions and scenarios

    What does dependency analysis mean in .b ,nitio?

  • 8/20/2019 Ab-Initio Interview Ques

    15/39

    What does unused port in 9oin component do?

    De0ne ulti 0le system Can you create multi0le system on the same server? .lso2 if you have a table that has ;ame2

    .ddress2 +tatus2 #osition attributes2 can ;ame and .ddress be on one partition and +tatus and #osition in the other

    partition?

    What is a sandbo!? Did the co3operating system version @E have sandbo!2 if not how would you store the respective 0les?

    &ow did you do version control? Which tool did you use?

    &ow do you troubleshoot performance issues in graph?

    What are the usual errors that you encounter during 1'% process apart from compilation process?

    Were you involved in production support? What were the di5erent kinds of problems that you encountered?

    &ow do you count the number of records in a multi0le system without using $D1?

    What does +can and Rollup component do and give a scenario where you used them?

    Did you ever used user de0ned functions or packages? ,f yes2 give a scenario

    What is di5erence between Rede0ne *ormat and Reformat components?

    +ometimes you have to use dynamic length strings Can you give me one circumstance where you need it?

    Why might you create a stored procedure with the Gwith recompileH option?

    &ow many parallelisms are in .binitio? #lease give a de0nition of each

    &ow to +chedule $raphs in .b,nitio2 like workKow +chedule in ,nformatica? .nd where we must is ni! shell scripting in

    .b,nitio?

    &ow to ,mprove #erformance of graphs in .b initio? $ive some e!amples or tips

     Ab Initio Questions and Answers:

    1 :: What does dependency analysis mean in Ab Initio?

    Dependency analysis will answer the questions regarding datalinage.That is where does the data come from,what applicationsprodeuce and depend on this data etc.

    We can retriee the ma!imum "surrogate #ey$ from the e!isting data,the by using scan or ne!t%in%sequence&reformat we cangenerate further sequence for new records.

    http://www.globalguideline.com/interview_questions/Answer.php?a=What_does_dependency_analysis_mean_in_Ab_Initio&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=What_does_dependency_analysis_mean_in_Ab_Initio&page=1

  • 8/20/2019 Ab-Initio Interview Ques

    16/39

    0s 1his Answer orrect? , 2es > ;o

    2 :: When using multiple DML statements to perform a single unit of wor! is it preferable to use implicit or 

    e"plicit transactions! and why?'ecause implicit is using for internal processing and e!plicit is using for user open data requied.

    0s 1his Answer orrect? , 2es > ;o

    # :: Describe the $rant%&e'oe DDL facility and how it is implemented?

    'asically,This is a part of D.'.A responsibilities ()A*T means permissions for e!ample ()A*T +)AT TA'- ,+)AT IW A*D /A*0 /1) .

    )12 means cancel the grant "permissions$.3o,(rant or )eo#e both commands depend upon D.'.A.

    0s 1his Answer orrect? , 2es ;o

    ( :: What is the difference between rollup and scan?

    'y using rollup we cant generate cumulatie summary records for that we will be using scan.

    0s 1his Answer orrect? , 2es > ;o

    ) :: Describe the elements you would re'iew to ensure multiple scheduled batch *obs do not collide with

    each other?

    'ecause eery 4ob depend upon another 4ob for e!ample if you first 4ob result is successfull then another 4ob will e!ecuteotherwise your 4ob doesn5t wor#.

    0s 1his Answer orrect? 3 2es > ;o

    + :: ,ow can i run the 2 $-I merge files?

    Do you mean by merging (ui map files in W).If so, by merging (6I map files in (6I map editor it wont create corresponding testscript.without testscript you cant run a file.3o it is impossible to run a file by merging 7 (6I map files.

    0s 1his Answer orrect?  3 2es > ;o

    . :: Describe how you would ensure that database ob*ect definitions /0ables! Indices! onstraints! 0riggers!

    -sers! Logins! onnection ptions! and 3er'er ptions etc4 are consistent and repeatable between multiple

    database instances /i5e5: a test and production copy of a database4?

    Ta#e an entire database bac#up and restore it in different instance.

    http://www.globalguideline.com/interview_questions/Answer.php?a=When_using_multiple_DML_statements_to_perform_a_single_unit_of_work_is_it_preferable_to_use_implicit_or_explicit_transactions_and_why&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=When_using_multiple_DML_statements_to_perform_a_single_unit_of_work_is_it_preferable_to_use_implicit_or_explicit_transactions_and_why&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_Grant-Revoke_DDL_facility_and_how_it_is_implemented&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_the_difference_between_rollup_and_scan&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_elements_you_would_review_to_ensure_multiple_scheduled_batch_jobs_do_not_collide_with_each_other&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_elements_you_would_review_to_ensure_multiple_scheduled_batch_jobs_do_not_collide_with_each_other&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=How_can_i_run_the_2_GUI_merge_files&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=When_using_multiple_DML_statements_to_perform_a_single_unit_of_work_is_it_preferable_to_use_implicit_or_explicit_transactions_and_why&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=When_using_multiple_DML_statements_to_perform_a_single_unit_of_work_is_it_preferable_to_use_implicit_or_explicit_transactions_and_why&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_Grant-Revoke_DDL_facility_and_how_it_is_implemented&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_the_difference_between_rollup_and_scan&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_elements_you_would_review_to_ensure_multiple_scheduled_batch_jobs_do_not_collide_with_each_other&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_elements_you_would_review_to_ensure_multiple_scheduled_batch_jobs_do_not_collide_with_each_other&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=How_can_i_run_the_2_GUI_merge_files&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2

  • 8/20/2019 Ab-Initio Interview Ques

    17/39

    Ta#e a statistics of all alid and inalid ob4ects and match.

    8eriodically refresh

    0s 1his Answer orrect? 3 2es ;o

    6 :: ,ow would you find out whether a 37L 8uery is using the indices you e"pect?

    !plain plan can be reiewed to chec# the e!ecution plan of the query. This would guide if the e!pected inde!es are used or not.

    0s 1his Answer orrect? 3 2es ;o

    9 :: ,ow to create repository in abinitio for stand alone system/LAL 04?

    If you are trying to install the Ab 9Initio on stand alone machine , then it is not necessary to create the repository , While installing Itcreates automatically for you under abinitio folder " where you installing the Ab9Initio$ If you are still not clear please as# your Question on the same portal .

    0s 1his Answer orrect? 3 2es ;o

    1; :: When running a stored procedure definition script how would you guarantee the definition could be

    rolled bac in the e'ent of problems?

    There are quite a few factors that determines the approach such as what type of ersion control are used, what is the sie of thechange, what is the impact of the change, is it a new procedure or replacing an e!isting and so on.

    If it is a new, then 4ust drop the wrong one

    if it is a replacement then how big is the change and what will be the possible impact, depending upon you can hae the entire

    database bac#ed up or 4ust create a script for your original procedure before messing it up or you 4ust do an ed and change the filebac# to original and reapply. you may rename the old procedure as old and then wor# on new and so on.

    few issues to #eep in mind are synonyms, dependancies, grants, any 4ob calling the procedure at the time of change and so on. Innutshell, scenario can be aried and solution also can be aried.

    11 ::

  • 8/20/2019 Ab-Initio Interview Ques

    18/39

    alter table =table%name> moe =tablespace%name> this actiity reclaims the defragmented space in the table

    analye table table%name compute statistics to capture the updated statistics.

    7$)eorg could be done by ta#ing a dump of the table, truncate the table and import the dump bac# into the table.

    0s 1his Answer orrect? 3 2es ;o

    1# :: ,ow can you force the optimi=er to use a particular inde"?

    6se hints &?@ =hint> ?&, these acts as directies to the optimier 

    0s 1his Answer orrect? 3 2es ;o

    1( :: What is a cursor? Within a cursor! how would you update fields on the row *ust fetched?

    The oracle engine uses wor# areas for internal processing in order to the e!ecute sql statement is called cursor.There are two typesof cursors li#e Implecit cursor and !plicit cursor.Implicit cursor is using for internal processing and !plicit cursor is using for user open for data required.

    0s 1his Answer orrect? 3 2es ;o

    1) :: Why might you create a stored procedure with the with recompile option?

    )ecompile is useful when the tables referenced by the stored proc undergoes a lot of modification&deletion&addition of data. Due tothe heay modification actiity the e!ecute plan becomes outdated and hence the stored proc performance goes down. If we createthe stored proc with recompile option, the sql serer wont cache a plan for this stored proc and it will be recompiled eery time it isrun.

    0s 1his Answer orrect? 3 2es ;o

     Ab Initio Questions and Answers:

    1+ :: What is the purpose of ha'ing stored procedures in a database?

    /ain 8urpose of 3tored 8rocedure for reduse the networ# trafic and all sql statement e!ecuting in cursor so speed too high.

    0s 1his Answer orrect? 3 2es ;o

    1. :: What are artesian *oins?

     A +artesian 4oin will get you a +artesian product. A +artesian 4oin is when you 4oin eery row of one table to eery row of another table. 0ou can also get one by 4oining eery row of a table to eery row of itself.

    http://www.globalguideline.com/interview_questions/Answer.php?a=How_can_you_force_the_optimizer_to_use_a_particular_index&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_a_cursor_Within_a_cursor_how_would_you_update_fields_on_the_row_just_fetched&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=Why_might_you_create_a_stored_procedure_with_the_with_recompile_option&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_the_purpose_of_having_stored_procedures_in_a_database&page=4http://www.globalguideline.com/interview_questions/Answer.php?a=What_are_Cartesian_joins&page=4http://www.globalguideline.com/interview_questions/Answer.php?a=How_can_you_force_the_optimizer_to_use_a_particular_index&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_a_cursor_Within_a_cursor_how_would_you_update_fields_on_the_row_just_fetched&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=Why_might_you_create_a_stored_procedure_with_the_with_recompile_option&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_the_purpose_of_having_stored_procedures_in_a_database&page=4http://www.globalguideline.com/interview_questions/Answer.php?a=What_are_Cartesian_joins&page=4

  • 8/20/2019 Ab-Initio Interview Ques

    19/39

    0s 1his Answer orrect? 3 2es ;o

    16 :: What is an outer *oin?

     An outer 4oin is used when one wants to select all the records from a port 9 whether it has satisfied the 4oin criteria or not.

    0s 1his Answer orrect? 3 2es ;o

    19 :: What are primary eys and foreign eys?

    In )D'/3 the relationship between the two tables is represented as 8rimary #ey and foreign #ey relationship.Wheras the primary#ey table is the parent table and foreign#ey table is the child table.The criteria for both the tables is there should be a matchingcolumn.

    0s 1his Answer orrect? 3 2es ;o

    2; :: ,a'e you used rollup component? Describe how?

    If the user wants to group the records on particular field alues then rollup is best way to do that. )ollup is a multi9stage transformfunction and it contains the following mandatory functions.

  • 8/20/2019 Ab-Initio Interview Ques

    20/39

    GB) represent the tranform functions.which will contain businessrules

    0s 1his Answer orrect? 3 2es ;o

    2. :: ,ow Does MA@&< wors?

    /a!core is a alue "it will be in 2b$.Whne eer a component is e!ecuted it will ta#e that much memeory we specified for e!ecution

    0s 1his Answer orrect? 3 2es ;o

    26 :: What is the synta" of mdump command?

    The genaral synta! is ;m%dump metadata data Haction ;

    0s 1his Answer orrect? 

    3 2es ;o

    29 :: an anyone gi'e me an e"aple of realtime start script in the graph?

    Eere is a simple e!ample to use a start script in a graph:

    In start script lets gie as:

    e!port JDTKLdate 5@MmMdMy5L

    *ow this ariable DT will hae today5s date before the graph is run.

    *ow somewhere in the graph transform we can use this ariable asN

    out.process%dt::JDTN

    which proides the alue from the shell.

    0s 1his Answer orrect? 3 2es ;o

    #; :: What are differences between different $D< 'ersions/151;!1511!1512!151#and 151)4?

    What are differences between different 'ersions of o>op?

  • 8/20/2019 Ab-Initio Interview Ques

    21/39

    #1 :: ,ow to run the graph without $D

  • 8/20/2019 Ab-Initio Interview Ques

    22/39

    22 :: What is ABLAL e"pression where do you use it in ab>initio?

    ablocal%e!pr is a parameter of itable component of Ab Initio.A'-1+A-"$ is replaced by the contents of ablocal%e!pr.Which we canma#e use in parallel unloads.There are two forms of A'%-1+A-"$ construct, one with no arguments and one with single argumentas a table name"driing table$.

    The use of A'%-1+A-"$ construct is in 3ome comple! 3Q- statements contain grammar that is not recognied by the Ab Initioparser when unloading in parallel. 0ou can use the A'-1+A-"$ construct in this case to preent the Input Table component from

    parsing the 3Q- "it will get passed through to the database$. It also specifies which table to use for the parallel clause.

    0s 1his Answer orrect? 3 2es ;o

    2# :: What is the latest 'ersion that is a'ailable in Ab>initio?

    The latest ersion of (D isminitio?

    0ou can use Jmp4ret in endscript li#e

    if O 9eq"Jmp4ret$

    then

    echo ;success;

    else

    mail! 9s ;Hgraphname failed; mailid

    0s 1his Answer orrect? 3 2es ;o

    2) :: I am unable to connect se'er database/oracle4 from $D

  • 8/20/2019 Ab-Initio Interview Ques

    23/39

    0s 1his Answer orrect? 3 2es ;o

    #. :: What is sew and sew measurement?

    s#ew is the mesaureof data flow to each partation .

    suppose i&p is comming from C files and sie is < gb

    < gbK "

  • 8/20/2019 Ab-Initio Interview Ques

    24/39

    • http233en.wi&ipedia.org3wi&i3Ab4Initio

    • http233www.abinitio.com

    • http233www.patents.com3Ab$Initio$Software$

    Corporation3Lexington3MA35/65503company3

    • http233www.bi$nerd.com3ab$initio$the$dar&$horse$of$etl3

    • -atents2 7S889:0/;.pdf  7S;/:;5.pdf  7S;68::.pdf  7S;68;19/.pdf 

    • http233www.lin&edin.com3companies3ab$initio

    Ab Initio is a private company its main offices are in Lexington Massach#setts )near +oston 7SA $ since 600:*b#t they have offices all over the world )as yo# can see on their web site*. They have very good talented devotedpeople. I%ve heard that when yo# are calling their c#stomer service $ there is a ;9< chance that yo# will spea&

    with a -h.=.. It may very well be tr#e. The company was formed by former employees of the Thin&ing MachinesCorporation. Some &ey people2 Craig . Stanfill >ichard A. Shapiro Stephen A. ?#&olich.

    Ab Initio also #ses its own people as well as independent cons#lting firms to b#ild proof of concept for a client andthen to g#ide clients in #sing their tools.

    7nfort#nately Ab Initio provides very little information abo#t their sol#tions to general p#blic. So not getting intodetails most of AI f#nctionality can be scripted #sing several commands which yo# can give from prompt )withmany options*2

    • m_* commands ) for example m4sh#tdown m4m&fs m4cp etc. * are #sed for

    administering

    • mp ... )some options* $ to define establish and r#n @obs

    • air ... )some options* $ to wor& with 'M' )basically a specialied version controlsystem*

    The scripts can be easily integrated to wor& with external sched#lers.

    Somewhere B600; Ab Initio has introd#ced raphical =evelopment 'nvironment $ a very powerf#l des&topsoftware. (o# place components on the screen connect them define what they do and how. So yo#r application isa graph. (o# can create components which consist of other components which consist of other components etc. $

    so effectively yo# can drill deeply into the diagram. I%ve seen this tool generating powerf#l data processingapplication in less than 6/ min#tes. (o# can r#n the application right from the I=' or save it as a set of scripts)&sh for #nix*. The scripts will call misc. component libraries. The libraries are written in CDD.

    Some of the &ey elements of the system2

    • "CoE,perating System"

    • "Component Library"

    • "raphical =evelopment 'nvironment" )='*

    • "'nterprise MetaE'nvironment" )'M'*

    http://en.wikipedia.org/wiki/Ab_Initiohttp://www.abinitio.com/http://www.patents.com/Ab-Initio-Software-Corporation/Lexington/MA/301339/company/http://www.patents.com/Ab-Initio-Software-Corporation/Lexington/MA/301339/company/http://www.bi-nerd.com/ab-initio-the-dark-horse-of-etl/http://www.selectorweb.com/AbInitio/US6654907.pdfhttp://www.selectorweb.com/AbInitio/US7047232.pdfhttp://www.selectorweb.com/AbInitio/US7164422.pdfhttp://www.selectorweb.com/AbInitio/US7167850.pdfhttp://www.linkedin.com/companies/ab-initiohttp://www.inc.com/magazine/19950915/2622.htmlhttp://www.inc.com/magazine/19950915/2622.htmlhttp://www.inc.com/magazine/19950915/2622.htmlhttp://en.wikipedia.org/wiki/Ab_Initiohttp://www.abinitio.com/http://www.patents.com/Ab-Initio-Software-Corporation/Lexington/MA/301339/company/http://www.patents.com/Ab-Initio-Software-Corporation/Lexington/MA/301339/company/http://www.bi-nerd.com/ab-initio-the-dark-horse-of-etl/http://www.selectorweb.com/AbInitio/US6654907.pdfhttp://www.selectorweb.com/AbInitio/US7047232.pdfhttp://www.selectorweb.com/AbInitio/US7164422.pdfhttp://www.selectorweb.com/AbInitio/US7167850.pdfhttp://www.linkedin.com/companies/ab-initiohttp://www.inc.com/magazine/19950915/2622.htmlhttp://www.inc.com/magazine/19950915/2622.html

  • 8/20/2019 Ab-Initio Interview Ques

    25/39

    • "=ata -rofiler"

    • "Cond#ctEIt"

    Main power of Ab Initio $ parallelism $ is achieved via its "CoE,perating System" which provides the facilities for"parallel exec#tion )m#ltiple C-7s and3or m#ltiple boxes* platform independent data transport chec& pointing and

    process monitoring. A lot of attention is devoted to monitoring reso#rces )C-7 memory*. m#lti$file m#lti$directory.

    Component Library $ a set of software mod#les to perform sorting data transforming and high speed data loadingand #nloading tas&s.

    Ab Initio tools incorporate best practices s#ch as chec&$pointing rer#nnability tagging everything with #niF#e Id$s etc.

    7nfort#nately Ab Initio doesn%t advertise or p#blish any information. So there are @#st bits and pieces here andthere. Gere is an interesting blog2

    • http233www.gee&interview.com3Interview$H#estions3=ata$areho#se3Abinitio

    6

    H#estion

    Answer

    -hases vs

    Chec&points

    -hases $ are #sed to brea& the graph into pieces. Temporary files createdd#ring a phase will be deleted after its completion. -hases are #sed to

    effectively separately manage reso#rce$cons#ming )memory C-7 dis&*parts of the application.

    Chec&points $ created for recovery p#rposes. These are points whereeverything is written to dis&. (o# can recover to the latest saved point $ and

    rer#n from it.

    (o# can have phase brea&s with or witho#t chec&points.

    xfr

    A new sandbox will have many directories2 mp dml xfr db ... . xfr is adirectory where yo# p#t files with extension .xfr containing yo#r own

    c#stom f#nctions )and then #se 2 incl#de "somepath3xfr3yo#rfile.xfr"*.

    7s#ally JK> stores mapping.

    threetypes of

    parallelism6* =ata -arallesim $ data )partitionning of data into parallel streams forparallel processing*.

    * Componnent -aralelism )exec#te sim#ltaneo#sly on different branches of 

    http://www.geekinterview.com/Interview-Questions/Data-Warehouse/Abinitiohttp://www.geekinterview.com/Interview-Questions/Data-Warehouse/Abinitio

  • 8/20/2019 Ab-Initio Interview Ques

    26/39

    the graph*

    5* -ipeline )seF#ential*.

    MKS

    M#lti$Kile System

    m4m&fs $ create a m#ltifile )m4m&fs ctrlfile mpfile6 ... mpfile*

    m4ls $ list all the m#ltifilesm4rm $ remove the m#ltifile

    m4cp $ copy a m#ltifile

    m4m&dir $ to add more directories to existing directory str#ct#re

    MemoryreF#ireme

    nts of agraph

    • 'ach partition of a component #ses2 B 1 M+ D max$core )if any*

    • Add sie of loop files #sed in phase )if m#ltiple components #se

    same loop only co#nt it once*

    • M#ltiply by degree of parallelism. Add #p all components in a phase

    that is how m#ch memory is #sed in that phase.

    • Select the largest$memory phase in the graph

    Gow tocalc#late a

    S7M

    SCA>,LL7-

    SCAITG>,LL7-

    Scan followed by =ed#p sort and select the last

    ded#p sort

    with n#ll&ey

    If we don%t #se any &ey in the sort component while #sing the ded#p sort

    then the o#tp#t depends on the &eep parameter.

    • first $ only the first record

    • last $ only last record

    • #niF#e4only $ there will be no records in the o#tp#t file.

     @oin onpartitioned

    file6 )A+C* file )A+=*. e partition both files by "A" and then @oin by"A+". IS it ,?! ,r sho#ld we partition by "A+" ! ot clear.

  • 8/20/2019 Ab-Initio Interview Ques

    27/39

    flow

    chec&in

    chec&o#t

    (o# can do chec&in3chec&o#t #sing the wiard right from the =' #sing

    versions and tags

    how to

    havedifferentpasswords

    for HA andprod#ction

    parameterie the .dbc file $ or #se environmental variable.

    Gow to get

    records

    9/$;9 o#tof 6//

    • #se scan and filter

    • m4d#mp NdmlE Nmfs fileE $start 9/ $end ;9

    • #se next4in4seF#ence)* f#nction and filter by expression component

    )next4in4seF#ence)* E9/ OO next4in4seF#ence)* N;9*

    Got toconvert a

    serial fileinto KKS

    create MKS then #se partition component

    pro@ectparameter

    s vs.sandbox

    parameters

    hen yo# chec& o#t a pro@ect into yo#r sandbox $ yo# get pro@ectparameters. ,nce in yo#r sandbox $ yo# can refer to them as sandbox

    parameters.

    +ad$Straight$

    flow

    error yo# get when connecting mismatching components )for exampleconnecting serial flow directly to mfs flow witho#t #sing a partition

    component*

    merging

    graphs

    (o# can not merge two ab initio graphs. (o# can #se the o#p#t of one graphas inp#t for another. (o# can also copy3paste the contents between graphs.

    See also abo#t #sing .plan

    partitioning re$

    partitioning

    departitioning

    • partitioning $ dividing a single flow of records)serial file mfs* into

    m#ltiple flows.

    • departitioning $ removing partitionning )gather an merge

    component*

  • 8/20/2019 Ab-Initio Interview Ques

    28/39

    • re$partitioning $ change the n#mber of partitions )eg from to :

    flows*

    loop file for large amo#nts of data #se MKS loop file )instead of serial*

    indexingo indexes as s#ch. +#t there is an "o#tp#t indexing" #sing reformat anddoing necessary coding in transform part.

    'nvironment pro@ect

    'nvironment pro@ect $ special p#blic pro@ect that exists in every Ab Initio

    environment. It contains all the environment parameters reF#ired by theprivate or p#blic pro@ects which constit#te AI Standard 'nvironment.

    Aggregatevs >oll#p

    Aggregate $ old component

    >oll#p $ newer extended recommended to #se instead of Agregate.)b#ilt$in f#nctions li&e s#m co#nt avg min max prod#ct ...*

    'M' ='

    Co$

    operatingsytem

    • 'M' 'nterprise Metdata 'nvironment. K#nctions )repository

    version control statistical analysis dependency analysis*. It is on

    the server side and holds all the pro@ects )metadata oftransformations config info so#rce and target info2 graph dml xfr

    &sh sFl etc..*. This is where yo# chec&in3chec&o#t. 3-ro@ect dir of'M' contains common directories for all application sandboxes

    connected to it. It also helps in dependency analysis of codes. Ab

    Initio has series of air commands to manip#late repository ob@ects.

    • =' raphical =evlopment 'nvironment )on the client box*

    Co$operating sytem Ab Initio server installed on top of native)#nix* os on the server

    fencing

    fencing means @ob controlling on priority basis.

    In AI it act#ally refers to c#stomied phase brea&ing. A well fenced graphmeans no matter what is so#rce data vol#me process will not co#gh in dead

    loc&s. It act#ally limits the n#mber of sim#ltaneo#s processes.

    Kencing $ changing a priority of a @ob

    -hasing $ managing the reso#rces to avoid deadloc&s.Kor example limiting the n#mber of sim#ltaneo#s processes

    )by brea&ing the graph into phases only 6 of which can r#n at any giventime*

    Contin#o#s

    components

    Contin#o#s components $ prod#ce #sef#l o#tp#t file while r#nning

    contino#sly. Kor example Contin#o#s roll#p Contin#o#s #pdate batchs#bscribe

  • 8/20/2019 Ab-Initio Interview Ques

    29/39

    H#estionAnswer

    deadloc&=eadloc& is when two or more processes are reF#esting the same reso#rce.To avoid #se phasing and reso#rce pooling.

    environment

    • A+4G,M' $ where coEoperating system is installed

    • A+4AI>4>,,T $ defa#lt location for 'M' datastore

    • sandboxes standard environment

    • AI4S,>T4MAJ4C,>' AI4G,M' AI4S'>IAL AI4MKS etc.

    • from #nix prompt2 env P grep AI

    wrapperscript

    #nix script to r#n graphs

    m#ltistag

    ecompone

    nt

    A m#ltistage component is a component which transforms inp#t records in 9stages )6.inp#t select .temporary initialiation 5.processing :. o#tp#t

    selection 9.finalie*. So it is a transform component which has pac&ages.'xamples2 scan ormalie and =enormalie roll#p scan normalie and

    denormalie sorted.

    =ynamic

    =ML

    =ynamic =ML is #sed if the inp#t metadata can change. 'xample2 atdifferent time different inp#t files are recieved for processing which have

    different dml. in that case we can #se flag in the dml and the flag is first

    read in the inp#t file recieved and according to the flag its correspondingdml is #sed.

    fan in fano#t

    • fan o#t $ partition component )increase parallelism*

    • fan in departition component )decrease parallelism*

    loc&a #ser can loc& the graph for editing so that others will see the message and

    can not edit the same graph.

     @oin vs

    loop

    Loop is good for spped for small files )will load whole file in memory*. Korlarge files #se @oin. (o# may need to increase the maxcore limit to handle

    big @oins.

  • 8/20/2019 Ab-Initio Interview Ques

    30/39

    m#lti

    #pdate

    m#lti #pdate exec#tes SHL statements $ it treats each inp#t record as a

    completely separate piece of wor&.

    sched#ler

    • e can #se A#tosys Control$M or any other external sched#ler.

    • e can ta&e care of dependencies in many ways. Kor example if

    scripts sho#ld r#n seF#entially we can arrange for this in A#tosys or

    we can create a wrapper script and p#t there several seF#entialcommands )noh#p command6.&sh O noh#p command.&sh O etc*.

    e can even create a special graph in Ab Initio to exec#te individ#alscripts as needed.

    Api and7tilitymodes in

    inp#t

    table

    These are database interfaces )api $ #ses SHL #tility $ b#l& loads whatevervendor provides*

    loop file

    • loop file component. K#nctions2 loop loop4co#nt

    loop4next loop4match loop4local.

    • Loops are always #sed with combination of the reformat

    components.

    Callingstored

    proc in

    =+

    (o# can call stored proc )for example from inp#t component*. In fact yo#can even write S- in Ab Initio. Ma&e it "with recompile" to ass#re good

    performance.

    KreF#ently #sed

    f#nctions

    string4ltrim string4lrtrim string4s#bstring reinterpret4as today)* now)*

    data

    validationis4valid is4n#ll is4blan& is4defined

    driving

    port

    hen @oining inp#ts )in/ in6 ...* one of the ports is #sed as "driving )bydefa#lt $ in/*. =riving inp#t is #s#ally the largest one. hereas the smallest

    can have "Sorted$Inp#t" parameter be set to "Inp#t need not be sorted"beca#se it will be loaded completely in memory.

    Ab Initiovs Ab Initio benefits2 parallelism b#ilt in m#litifile system handles h#ge

  • 8/20/2019 Ab-Initio Interview Ques

    31/39

    Informatica for 'TL

    amo#nts of data easy to b#ild and r#n. enerates scripts which can be

    easily modified as needed *if something co#ldn%t be done in 'TL tool itself*.The scripts can be easily sched#led #sing any external sched#ler $ and easily

    integrated with other systems.

    Ab Initio doesn%t reF#ire a dedicated administrator.

    Ab Initio doesn%t have b#ilt$in C=C capabilities )C=C Change =ataCapt#re*.

    Ab Initio allows to )attach error 3 re@ect files* to each transformation andcapt#re and analye the message and data separately )as opposed to

    Informatica which has @#st one h#ge log*. Ab Initio provides immediatemetrics for each component.

    override&ey

    override &ey option is #sed when we need to @oin fields which havedifferent field names.

    controlfile

    control file sho#ld be in the m#ltifile directory )contains the addresses of theserial files*

    max$core

    max$core parameter )for example sort 6// M+ytes* specifies the amo#nt of memory #sed by a component )li&e Sort or >oll#p* $ per partition $ before

    spilling to dis&. 7s#ally yo# don%t need to change it $ @#st #se defa#lt val#e.Setting it too high may degrade the performance beca#se of ,S swapping

    and degrading of the performance of other components.

    Inp#t

    -arameters

    graph E select parameters tab E clic& "create" $ and create a parameter.

    7sage2 Qparamname. 'dit E parameters. These parameters will bes#bstit#ted d#ring r#n time. (o# may need to declare yo# parameter scope

    as formal.

    'rror

    Trapping

    'ach component has reject error and log ports. >e@ect capt#res re@ected

    records 'rror capt#res corresponding error and log capt#res the exec#tionstatistics of the component. (o# can control re@ect stat#s of each component

    by setting re@ect threshold to either ever Abort Abort on first re@ect orsetting ramp3limit. (o# can also #se force4error)* f#nction in transform

    f#nction.

    5

    H#estionAnswer

    Gow to see

    reso#rce #sage

    In =' goto options Riew E Trac&ing =etails $ will see each

    component%s C-7 and memory #sage etc.

    assign &eys 'asy and saves development time. eed to #nderstand how to feed

  • 8/20/2019 Ab-Initio Interview Ques

    32/39

    component parameters and yo# can%t control it easily.

    oin in =+ vs

     @oin in Ab Initio

    • Scenario 6 )preferred*2 we r#n F#ery which @oins tables in =+

    and gives #s the res#lt in @#st 6 =+ component.

    • Scenario )m#ch slower*2 we #se database components

    extract all data $ and @oin them in Ab Initio.

    oin with =+not recommended if n#mber of records is big. It is better to retrievethe data o#t $ and then @oin in Ab Initio.

    =ata S&ew

    -arameter showing how data is #nevenly distrib#ted between

    partitions.

    s&ew )partition sie $ avg.part.sie* 6// 3 )sie of the largestpartition*

    dbc vs cfg

    .dbc $ database config#ration file )dbname nodes version #ser3pwd* $

    resides in the db directory

    .cfg $ any tyoe of config file. for example remote connection config)name of remote server #ser3pwd to connect to db location of ,S on

    remote machine connection method*. .cfg file resides in the config dir.

    compilationerrors

    depth not eF#al data format error etc...

    depth error 2 we get this error.. when two components connectedtogether b#t does%t match there layo#t

    types ofpartitions

    broadcast pbyexpression pbyro#ndrobin pby&ey pwithloadbalance

    #n#sed portwhen @oining #sed records go to the o#tp#t port #n#sed records $ tothe #n#sed port

    t#ningperformance   • o parallel #sing partitionning. >o#ndrobin partitionning gives

    good balance.

    • 7se M#lti$file system )MKS*.

    • 7se Ad Goc MKS to read many serial files in parallel and #se

  • 8/20/2019 Ab-Initio Interview Ques

    33/39

    concat component.

    • ,nce data is partitionned $ do not switch it to serial and bac&.

    >epartition instead.

    • =o not acceess large filess via KS $ #se KT- instead

    • #se loop local rather than loop )especially for big loops*.

    • 7se roll#p and Kilter as soon as possible to red#ce n#mber of

    records. Ideally do it in the so#rce )database !* before yo# get

    the data.

    • >emove #nnecessary components. Kor example instead of

    #sing filter by exp yo# can implement the same f#nction inreformat3oin3>oll#p. Another example $ when @oining data from

    files #se #nion f#nction instead of adding an additionalcomponent for removing d#plicates.

    • #se gather instead of concatenate.

    • it is faster to do a sort after a partitino than to do a sort before

    a partition.

    • try to avoid #sing a @oin with the "db" component.

    • when getting data from database $ ma&e s#re yo#r F#eries are

    fast )#se indexes etc.*. If possible do necessary selection 3

    aggregation 3 sorting in the database before getting data intoAb Initio.

    • t#ne Max4core for ,ptimal performance )for sort depends on

    the sie of the inp#t file*.

    • ote $ If in$memory @oin cannot fit its non$driving inp#ts in the

    provided MAJ$C,>' then it will drop all the inp#ts to dis& and

    in$memory does not ma&e sence.

    • 7sing phase brea&s let yo# allocate more memory in individ#al

    components $ th#s improving performance.

    • 7se chec&point after sort to land data on dis&

    • 7se oin and roll#p in$memory feat#re

    • hen @oining very small dataset to a very large dataset it is

    more efficient to broadcast the small dataset to MKS #sing

  • 8/20/2019 Ab-Initio Interview Ques

    34/39

    broadcast component or #se the small file as loop. +#t for

    large dataset don%t #se broadcast as a partitioner.

    • 7se Ab Initio layo#t instead of database defa#lt to achieve

    parallel loads

    • Change A+4>'-,>T parameter to increased monitoring d#ration

    • 7se catalogs for re#sability

    • Components li&e @oin3 roll#p sho#ld have the option "Inp#t m#st

    be sorted"if they are placed after a sort component.

    • minimie n#mber of sort components. Minimie #sage of sorted

     @oin component and if possible replace them by in$memory

     @oin3hash @oin. 7se only reF#ired fields in the sort reformat @oincomponents. 7se "Sort within ro#ps" instead of @#st Sort when

    data was already presorted.

    • 7se phasing3flow b#ffers in case of merge sorted @oins

    • Minimie the #se of reg#lar expression f#nctions li&e re4index in

    the transfer f#nctions

    • Avoid repartitioning of data #nnecessarily. hen splitting

    records into more than two flows #se >eformat rather than

    +roadcast component.

    • Kor @oining records from flows #se Concatenate component

    ,L( when there is a need to follow some specific order in @oining records. If no order is reF#ired then it is preferable to

    #se ather component.

    • Instead of p#tting many >eformat components consec#tively

    #se o#tp#t indexes parameter in the first >eformat componentand mention the condition there.

    delta table

    • =elta table maintain the seF#encer of each data table.

    • Master )or base* table $ a table on tp of which we create a view

    scan vs roll#proll#p $ performs aggregate calc#lations on gro#ps scan $ calc#latesc#m#lative totals

  • 8/20/2019 Ab-Initio Interview Ques

    35/39

    pac&ages #sed in m#ltistage components or transform components

    >eformat vs">edefine

    Kormat"

    • >eformat $ deriving new data by adding3dropping fields

    >edefine format $ rename fields

    Conditional=ML

    =ML which is separated based on a condition

    S,>TITGI>,7-

    • The prereF#isit for #sing sortwithingro#p is that the data is

    already sorted by the ma@or &ey. sortwithingro#p o#tp#ts the

    data once it has finished reading the ma@or &ey gro#p. It is li&ean implicit phase.

    passing a

    condition as a

    parameter

    =efine a Kormal ?eyword -arameter of type string. Kor example yo#call it KilterCondition and yo# want it to do filtering on C,7T E / .

    Also in yo#r graph in yo#r "Kilter by expression" Component enterfollowing condition2 QKilterCondition

    ow on yo#r command line or in wrapper script give the followingcommand

      (o#rraphname.&sh $KilterCondition C,7T E /

    -assing file

    name as aparameter

    #!/bin/ksh

    #Running the set up script on enviornmenttypeset PROJ_DIR $(cd $(dirnme $"/ p%d" $PROJ_DIR/b_pro&ect_setupksh $PROJ_DIR#'porting the script prmeter) to I*P+,_-I.'_*0'i1 2 $# 3ne 4 5then  I*P+,_-I.'_PR0','R_) $)  I*P+,_-I.'_PR0','R_4 $4  # ,his grph is using the input 1i6e  cd $I_R+*  /my_grph)ksh $I*P+,_-I.'_PR0','R_)  # ,his grph 6so is using the input 1i6e  /my_grph4ksh $I*P+,_-I.'_PR0','R_4  eit e6se  echo Insu11icient prmeterseit )

    1i3333333333333333333333333333333333333#!/bin/ksh

    #Running the set up script on enviornmenttypeset PROJ_DIR $(cd $(dirnme $"/ p%d"

  • 8/20/2019 Ab-Initio Interview Ques

    36/39

    $PROJ_DIR/b_pro&ect_setupksh $PROJ_DIR

    #'porting the script prmeter) to I*P+,_-I.'_*0'eport I*P+,_-I.'_*0' $)

    # ,his grph is using the input 1i6e

    cd $I_R+*/my_grph)ksh

    # ,his grph 6so is using the input 1i6e/my_grph4ksh

    eit

    Gow to remove

    header and

    trailer lines!

    #se conditional dml where yo# can separate detail from header and

    trailer. Kor validations #se reformat with co#nt 25 )o#t/2header

    o#t62detail o#t2trailer.*

    Gow to createa m#lti file

    system onindows

    • first method2 in =' go to >7 E 'xec#te Command $ and r#n

    m4m&fs c2control c2dp6 c2dp c2dp5 c2dp:

    • second method2 do#ble$clic& on the file component and in ports

    tab do#ble$clic& on partitions $ there yo# can enter the n#mberof partitions.

    RectorA vector is simply an array. It is an ordered set of elements of the

    same type )type can be any type incl#ding a vector or a record*.

    =ependency

    Analysis

    =ependency analysis will answer the F#estions regarding datalinagethat is where does the data come from what applications prode#ce and

    depend on this data etc..

    :

    H#estionAnswer

    S#rrogate

    &ey There are many ways to create a s#rrogate &ey. Kor example yo# can

    #se next_in_sequence() f#nction in yo#r transform. ,r yo# can #se"Assign key values" component. ,r yo# can write a stored proced#re $ and

    call it.

    ote2 if yo# #se partitions then do something li&e this2

  • 8/20/2019 Ab-Initio Interview Ques

    37/39

    )next4in4seF#ence)*$6*no4of4partition)*Dthis4partition)*

    .abinitiorcThis is a config file for ab initio $ in #ser%s home directory and inQA+4G,M'3Config. It sets abinitio home path config#ration variables

    )A+4,>?4=I> A+4=ATA4=I> etc.* login info )id encrypted password*login methods for hosts for exec#tion )li&e 'M' host etc.* etc.

    .profileyo#r &sh init file ) environment aliases path variables history file settingscommand prompt settings etc.*

    data

    mapping

    datamodelling

     

    Gwo toexec#te

    the graph

    Krom =' $ whole graph or by phases. Krom chec&point. Also #sing &sh

    scripts

    rite

    M#ltiplefiles

    A component which allows to write sim#ltaneo#sly into m#ltiple local files

    Testing >#n the graph $ see the res#lts. 7se components from Ralidate category.

    Sandbox

    vs 'M'

    Sandbox is yo#r private area where yo# develop and test. ,nly one pro@ectand one version can be in the sandbox at any time. The !"atastorecontains all versions of the code that have been chec&ed into it)so#rce control*.

    Layo#t

    here the data$files are and where the components are r#nning. Kor

    example for data $ serial or partitioned )m#lti$file*. The layo#t is defined bythe location of the file )or a control file for the m#ltifile*. In the graph the

    layo#t can propagate a#tomatically )for m#ltifile yo# have to providedetails*.

    Latest

    versionsApril //02 =' ver.6.69.8 Co$operative system ver .6:.

    raphparamete

    rs

    men# edit E parameters $ allows yo# to specify private parameters for the

    graph. They can be of types $ local and formal.

    -lanEIt(o# can define pre$ and post$processes triggers. Also yo# can specifymethods to r#n on s#ccess or on fail#re of the graphs.

  • 8/20/2019 Ab-Initio Interview Ques

    38/39

    KreF#entl

    y #sed

    components

    • inp#t file 3 o#tp#t file

    • inp#t table 3 o#tp#t table

    • loop 3 loop4local

    • reformat

    • gather 3 concatenate

    •  @oin

    • r#nsFl

    •  @oin with db

    • compression components

    • filter by expression

    • sort )single or m#ltiple &eys*

    • roll#p

    • trash

    • partition by expression 3 partition by &ey

    r#nning

    on hosts

    coEoperating system is layered on top of native ,S )#nix*. hen r#nning

    from =' =' generates a script )according to "r#n" setings*. CoEop

    system will exec#te the scripts on different machines )#sing specified hostsettings and connection methods li&e rexec telnet rsh rlogin* $ and then

    ret#rn error or s#ccess codes bac&.

    conventio

    nal

    loading vsdirectloading

    This is basically an ,racle F#estion $ regarding SHLL=> )SHL Loader* #tility.Conventional load $ #sing insert statements. All triggers will fire all

    contraints will be chec&ed all indexes will be #pdated.

    =irect load $ data is written directly bloc& by bloc&. Can load into specific

    partition. Some constraints are chec&ed indexes may be disabled $ need to

    specify native options to s&ip index maintenance.

    semi$@oin

    abinitio online help gives 5 examples of @oins2 inner @oin o#ter @oin and

  • 8/20/2019 Ab-Initio Interview Ques

    39/39

    semi @oin.

    • for inner @oin %record4reF#ired% parameter is tr#e for all "in" ports.

    • for o#ter @oin it is false for all the "in" ports.

    • for semi @oin it is tr#e for both port )li&e Inneroin* b#t the ded#p

    option is set only on one side