Upload
hem777
View
263
Download
2
Embed Size (px)
Citation preview
8/20/2019 Ab-Initio Interview Ques
1/39
What is the relation between EME , GDE
and Co-operating system ?ans. EME is said as enterprise metdata env, GDE as graphical devlopment env and Co-operating sytem can be said as asbinitio server
relation b/w this CO-OP, EME AND GDE is as fallows
Co operating system is the Abinitio Server. this co-op is installed on perticular O.S platform that is called NATIVE O.S .comming to the EME,
its i just as repository in informatica , its hold the metadata,trnsformations,db config files source and targets informations. comming to GDE its
is end user envirinment where we can devlop the graphs(mapping just like in informatica)
desinger uses the GDE and designs the graphs and save to the EME or Sand box it is at user side.where EME is ast server side.
What is the use of aggregation when we
have rollupas we know rollup component in abinitio is used to summirize group of data record. then where we will use aggregation ?ans: Aggregation and Rollup both can summerise the data but rollup is much more convenient to use. In order to understand how a particular
summerisation being rollup is much more explanatory compared to aggregate. Rollup can do some other functionalities like input and output
filtering of records.
Aggregate and rollup perform same action, rollup display intermediat
result in main memory, Aggregate does not support intermediat result
what are kinds of layouts does ab initio supports
Basically there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends on
the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is
defined such as it’s same as the degree of parallelism.
How can you run a graph infinitely?To run a graph infinitely, the end script in the graph should call the .ksh file of the graph. Thus if the name of the graph is abc.mp then in the
end script of the graph there should be a call to abc.ksh.
Like this the graph will run infinitely.
How do you add default rules in
transformer?Double click on the transform parameter of parameter tab page of component properties, it will open transform editor. In the transform editorclick on the Edit menu and then select Add Default Rules from the dropdown. It will show two options – 1) Match Names 2) Wildcard.
Do you know what a local lookup is?If your lookup file is a multifile and partioned/sorted on a particular key then local lookup function can be used ahead of lookup function call.
This is local to a particular partition depending on the key.
8/20/2019 Ab-Initio Interview Ques
2/39
Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much
faster than retirving from disk. It allows the transform component to process the data records of multiple files fastly.
What is the difference between look-up file
and look-up, with a relevant example?Generally Lookup file represents one or more serial files(Flat files). The amount of data is small enough to be held in the memory. This allows
transform functions to retrive records much more quickly than it could retrive from Disk.
A lookup is a component of abinitio graph where we can store data and retrieve it by using a key parameter.
A lookup file is the physical file where the data for the lookup is stored.
How many components in your most complicated graph? It depends the type of components you us.
usually avoid using much complicated transform function in a graph.
Explain what is lookup?Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file
(serial/multi file). The dataset can be static as well dynamic ( in case the lookup file is being generated in previous phase and used as lookup
file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the input to the join contains less
number of records with slim record length.
AbInitio has built-in functions to retrieve values using the key for the lookup
What is a ramp limit?
The limit parameter contains an integer that represents a number of reject events
The ramp parameter contains a real number that represents a rate of reject events in the number of records processed.
no of bad records allowed = limit + no of records*ramp.
ramp is basically the percentage value (from 0 to 1)
This two together provides the threshold value of bad records.
Have you worked with packages?Multistage transform components by default uses packages. However user can create his own set of functions in a transfer function and can
include this in other transfer functions.
Have you used rollup component? Describe
how.If the user wants to group the records on particular field values then rollup is best way to do that. Rollup is a multi-stage transform functionand it contains the following mandatory functions.
1. initialise
2. rollup
3. finalise
Also need to declare one temporary variable if you want to get counts of a particular group.
8/20/2019 Ab-Initio Interview Ques
3/39
For each of the group, first it does call the initialise function once, followed by rollup function calls for each of the records in the group and
finally calls the finalise function once at the end of last rollup call.
How do you add default rules in
transformer?Add Default Rules — Opens the Add Default Rules dialog. Select one of the following: Match Names — Match names: generates a set of
rules that copies input fields to output fields with the same name. Use Wildcard (.*) Rule — Generates one rule that copies input fields to
output fields with the same name.
)If it is not already displayed, display the Transform Editor Grid.
2)Click the Business Rules tab if it is not already displayed.
3)Select Edit > Add Default Rules.
In case of reformat if the destination field names are same or subset of the source fields then no need to write anything in the reformat xfr
unless you dont want to use any real transform other than reducing the set of fields or split the flow into a number of flows to achive thefunctionality.
What is the difference between partitioning
with key and round robin?Partition by Key or hash partition -> This is a partitioning technique which is used to partition data when the keys are diverse. If the key is
present in large volume then there can large data skew. But this method is used more often for parallel data processing.
Round robin partition is another partitioning technique to uniformly distribute the data on each of the destination data partitions. The skew is
zero in this case when no of records is divisible by number of partitions. A real life example is how a pack of 52 cards is distributed among 4
players in a round-robin manner.
How do you improve the performance of a
graph?There are many ways the performance of the graph can be improved.
1) Use a limited number of components in a particular phase
2) Use optimum value of max core values for sort and join components
3) Minimise the number of sort components
4) Minimise sorted join component and if possible replace them by in-memory join/hash join
5) Use only required fields in the sort, reformat, join components
6) Use phasing/flow buffers in case of merge, sorted joins
7) If the two inputs are huge then use sorted join, otherwise use hash join with proper driving port
8) For large dataset don’t use broadcast as partitioner
9) Minimise the use of regular expression functions like re_index in the trasfer functions
10) Avoid repartitioning of data unnecessarily
8/20/2019 Ab-Initio Interview Ques
4/39
Try to run the graph as long as possible in MFS. For these input files should be partitioned and if possible output file should also be
partitioned.
How do you truncate a table?
From Abinitio run sql component using the DDL “trucate table
By using the Truncate table component in Ab Initio
Have you eveer encountered an error called
“depth not equal”?When two components are linked together if their layout doesnot match then this problem can occur during the compilation of the graph. A
solution to this problem would be to use a partitioning component in between if there was change in layout.
What is the function you would use totransfer a string into a decimal?In this case no specific function is required if the size of the string and decimal is same. Just use decimal cast with the size in the transform
function and will suffice. For example, if the source field is defined as string(8) and the destination as decimal(8) then (say the field name is
field1).
out.field :: (decimal(8)) in.field
If the destination field size is lesser than the input then use of string_substring function can be used likie the following.
say destination field is decimal(5).
out.field :: (decimal(5))string_lrtrim(string_substring(in.field,1,5)) /* string_lrtrim used to trim leading and trailing spaces */
What are primary keys and foreign keys?
In RDBMS the relationship between the two tables is represented as Primary key and foreign key relationship.Wheras the primary key table is
the parent table and foreignkey table is the child table.The criteria for both the tables is there should be a matching column.
What is the diference between clustered and non-clustered indices? …and why do you use a clustered index?
What is an outer join?
An outer join is used when one wants to select all the records rom a port – whether it has satised the join
criteria or not.
What are artesian joins?
joins two tables without a join key. Key should be {}.
What is the purpose of having stored procedures in a database?
Main Purpose of Stored Procedure for reduse the network trafic and all sql statement executing in cursor so speed too high.
Why might you create a stored procedure with the ‘with recompile’ option?
8/20/2019 Ab-Initio Interview Ques
5/39
Recompile is useful when the tables referenced by the stored proc undergoes a lot of modification/deletion/addition of data. Due to the heavy
modification activity the execute plan becomes outdated and hence the stored proc performance goes down. If we create the stored proc with
recompile option, the sql server wont cache a plan for this stored proc and it will be recompiled every time it is run.
What is a cursor? Within a cursor, howwould you update fields on the row just
fetchedThe oracle engine uses work areas for internal processing in order to the execute sql statement is called cursor.There are two types of
cursors like Implecit cursor and Explicit cursor.Implicit cursor is using for internal processing and Explicit cursor is using for user open for data
required.
How would you find out whether a SQLquery is using the indices you expect?explain plan can be reviewed to check the execution plan of the query. This would guide if the expected indexes are used or not.
How can you force the optimizer to use a
particular index?use hints /*+ */, these acts as directives to the optimizer
select /*+ index(a index_name) full(b) */ *from table1 a, table2 bwhere b.col1 = a.col1 and b.col2= ‘sid’and b.col3 = 1;
When using multiple DML statements to perform a single unit of work, is it preferable to use implicit or explicit transactions, and why.
Because implicit is using for internal processing and explicit is using for user open data requied.
Describe the elements you would review to ensure multiple scheduled “batch” jobs do not “collide” with each other.
Because every job depend upon another job for example if you first job result is successfull then another job will execute otherwise your job
doesn’t work.
Describe the process steps you would
perform when defragmenting a data table.This table contains mission critical data.
There are several ways to do this:
1) We can move the table in the same or other tablespace and rebuild all the indexes on the table.
8/20/2019 Ab-Initio Interview Ques
6/39
alter table move this activity reclaims the defragmented space in the table
analyze table table_name compute statistics to capture the updated statistics.
2)Reorg could be done by taking a dump of the table, truncate the table and import the dump back into the table.
Explain the difference between the“truncate” and “delete” commands.The difference between the TRUNCATE and DELETE statement is Truncate belongs to DDL command whereas DELETE belongs to DML
command.Rollback cannot be performed incase of Truncate statement wheras Rollback can be performed in Delete statement. “WHERE”
clause cannot be used in Truncate where as “WHERE” clause can be used in DELETE statement.
What is the difference between a DB config
and a CFG file?A .dbc file has the information required for Ab Initio to connect to the database to extract or load tables or views. While .CFG file is the tableconfiguration file created by db_config while using components like Load DB Table.
Describe the “Grant/Revoke” DDL facility
and how it is implemented.Basically,This is a part of D.B.A responsibilities GRANT means permissions for example GRANT CREATE TABLE ,CREATE VIEW AND
MANY MORE .
REVOKE means cancel the grant (permissions).So,Grant or Revoke both commands depend upon D.B.A.
!a"e you wor#ed with pac#a$es?
Ans% Multistage transform components by default uses packages. However user can create his own set of functions in a transfer function
and can include this in other transfer functions.
!a"e you used rollup component? &escribe how.
Ans% If the user wants to group the records on particular field values then rollup is best way to do that. Rollup is a multi-stage transform
function and it contains the following mandatory functions.
1. initialise
2. rollup
3. finalise
Also need to declare one temporary variable if you want to get counts of a particular group.
For each of the group, first it does call the initialise function once, followed by rollup function calls for each of the records in the group and
finally calls the finalise function once at the end of last rollup call.
!ow do you add deault rules in transormer?
Ans% Add Default Rules — Opens the Add Default Rules dialog. Select one of the following: Match Names — Match names: generates a set
of rules that copies input fields to output fields with the same name. Use Wildcard (.*) Rule — Generates one rule that copies input fields to
output fields with the same name.
1)If it is not already displayed, display the Transform Editor Grid.
2)Click the Business Rules tab if it is not already displayed.
3)Select Edit > Add Default Rules.
8/20/2019 Ab-Initio Interview Ques
7/39
In case of reformat if the destination field names are same or subset of the source fields then no need to write anything in the reformat xfr
unless you dont want to use any real transform other than reducing the set of fields or split the flow into a number of flows to achive the
functionality.
What is the diference between partitionin$ with #ey and round robin?
Ans% Partition by Key or hash partition -> This is a partitioning technique which is used to partition data when the keys are diverse. If the key
is present in large volume then there can large data skew. But this method is used more often for parallel data processing.Round robin partition is another partitioning technique to uniformly distribute the data on each of the destination data partitions. The skew is
zero in this case when no of records is divisible by number of partitions. A real life example is how a pack of 52 cards is distributed among 4
players in a round-robin manner.
!ow do you impro"e the perormance o a $raph?
Ans% There are many ways the performance of the graph can be improved.
1) Use a limited number of components in a particular phase
2) Use optimum value of max core values for sort and join components
3) Minimise the number of sort components
4) Minimise sorted join component and if possible replace them by in-memory join/hash join
5) Use only required fields in the sort, reformat, join components
6) Use phasing/flow buffers in case of merge, sorted joins
7) If the two inputs are huge then use sorted join, otherwise use hash join with proper driving port
8) For large dataset don’t use broadcast as partitioner
9) Minimise the use of regular expression functions like re_index in the trasfer functions
10) Avoid repartitioning of data unnecessarily
Try to run the graph as long as possible in MFS. For these input files should be partitioned and if possible output file should also be
partitioned.
!ow do you truncate a table?
Ans% From Abinitio run sql component using the DDL “trucate table
By using the Truncate table component in Ab Initio
What is the relation between '(' ) *&' and o-operatin$ system ?
Ans % EME is said as enterprise metdata env,
GDE as graphical devlopment env and Co-operating sytem can be said as asbinitio server relation b/w this CO-OP, EME AND GDE is as
follows
o operatin$ system is the Abinitio Server.This co-op is installed on perticular O.S platform that is called NATIVE O.S .comming to the
EME, its i just as repository in informatica , its hold the metadata,trnsformations,db config files source and targets informations. comming to
GDE its is end user envirinment where we can devlop the graphs(mapping just like in informatica) desinger uses the GDE and designs thegraphs and save to the EME or Sand box it is at user side where EME is ast server side.
What is the use o a$$re$ation when we ha"e rollup as we #now rollup component in abinitio is used to
summiri+e $roup o data record. then where we will use a$$re$ation ?
Ans% Aggregation and Rollup both can summerise the data but rollup is much more convenient to use. In order to understand how a
particular summerisation being rollup is much more explanatory compared to aggregate. Rollup can do some other functionalities like input
and output filtering of records. Aggregate and rollup perform same action, rollup display intermediat
result in main memory, Aggregate does not support intermediat result.
What are #inds o layouts does ab initio supports?
Ans% Basically there are serial and parallel layouts supported by AbInitio. A graph can have both at the same time. The parallel one depends
on the degree of data parallelism. If the multi-file system is 4-way parallel then a component in a graph can run 4 way parallel if the layout is
defined such as it’s same as the degree of parallelism.
!ow can you run a $raph innitely?
Ans% To run a graph infinitely, the end script in the graph should call the .ksh file of the graph. Thus if the name of the graph is abc.mp then
in the end script of the graph there should be a call to abc.ksh. Like this the graph will run infinitely.
!ow do you add deault rules in transormer?
Ans % Double click on the transform parameter of parameter tab page of component properties, it will open transform editor. In the transform
editor click on the Edit menu and then select Add Default Rules from the dropdown. It will show two options -, (atch ames /
Wildcard.
&o you #now what a local loo#up is?
Ans % If your lookup file is a multifile and partioned/sorted on a particular key then local lookup function can be used ahead of lookup
function call. This is local to a particular partition depending on the key.
Lookup File consists of data records which can be held in main memory. This makes the transform function to retrieve the records much
faster than retirving from disk. It allows the transform component to process the data records of multiple files fastly.
8/20/2019 Ab-Initio Interview Ques
8/39
What is the diference between loo#-up le and loo#-up) with a rele"ant example? Ans% Generally Lookup file
represents one or more serial files(Flat files). The amount of data is small enough to be held in the memory. This allows transform functions
to retrive records much more quickly than it could retrive from Disk.
A lookup is a component ofabinitio $raph where we can store data and retrieve it by using a key parameter.
A lookup file is the physical file where the data for the lookup is stored.
!ow many components in your most complicated $raph? 0t depends the type o components you us.
Ans% Usually avoid using much complicated transform function in a graph.
'xplain what is loo#up?
Ans% Lookup is basically a specific dataset which is keyed. This can be used to mapping values as per the data present in a particular file
(serial/multi file). The dataset can be static as well dynamic ( in case the lookup file is being generated in previous phase and used as lookup
file in current phase). Sometimes, hash-joins can be replaced by using reformat and lookup if one of the input to the join contains less
number of records with slim record length.
AbInitio has built-in functions to retrieve values using the key for the lookup
What is a ramp limit?
Ans% The limit parameter contains an integer that represents a number of reject events
The ramp parameter contains a real number that represents a rate of reject events in the number of records processed.
no o bad records allowed = limit + no of records*ramp.
ramp is basically the percentage value (from 0 to 1)
This two together provides the threshold value of bad records.
What is destructor what is destructor
What is XML-RPC? What is XML-RPC?What is new about Web services? What is new about Web services?
What is a Web service? What is a Web service?
What kind of services operating system provides? What kind of services operating system provides?
What is logic? What is logic?
What is algorithm? What is algorithm?
What is constant? What is constant?
What is variable? What is variable?
What for an assignment statement is used? What for an assignment statement is used?
What are the four basic types of data? What are the four basic types of data?
What for a conditional loop is best suited? What for a conditional loop is best suited?
What for an incremented loop is best suited? What for an incremented loop is best suited?
What is Relational operators used for? What is Relational operators used for?
What Relational Operators Do you know? (C) What Relational Operators Do you know? (C)
What does grep() stand for? (ni! interview "uestion) What does grep() stand for? (ni! interview "uestion)
What does R#$ stand for? What does R#$ stand for?
8/20/2019 Ab-Initio Interview Ques
9/39
What does R#$ stand for? What does R#$ stand for?
What does %isp stand for? What does %isp stand for?
What does &'% stand for? What does &'% stand for?
What does *ortran stand for? What does *ortran stand for?
What does DO+ stand for? What does DO+ stand for?
What does C$, stand for? What does C$, stand for?
What does COR-. stand for? What does COR-. stand for?
What does Cobol stand for? What does Cobol stand for?
What does Case stand for? What does Case stand for?
What does -.+,C stand for? What does -.+,C stand for?
What does .+C,, stand for? What does .+C,, stand for?
What does .lgol stand for? What does .lgol stand for?
What does +/% stand for? What does +/% stand for?
What is the latest version that is available in Ab-initio?&ow to take the input data from an e!cel sheet?
&ow will you test a dbc 0le from command prompt ?
Which one is faster for processing 0!ed length dmls or delimited dmls and why ?
What are the contineous components in .binitio?
What is meant by fancing in abinitio ?
What is the relation between 11 2 $D1 and Co3operating system ?
What is the use of aggregation when we have rollup as we know rollup component in abinitio is used to summiri4e group of
data record then where we will use aggregation ?
Describe the process steps you would perform when defragmenting a data table 'his table contains mission critical data
1!plain the di5erence between the ?truncate? and 6delete7 commands
8/20/2019 Ab-Initio Interview Ques
10/39
When running a stored procedure de0nition script how would you guarantee the de0nition could be 6rolled back7 in the
event of problems
Describe the ?$rant8Revoke? DD% facility and how it is implemented
Describe how you would ensure that database ob9ect de0nitions ('ables2 ,ndices2 Constraints2 'riggers2 sers2 %ogins2Connection Options2 and +erver Options etc) are consistent and repeatable between multiple database instances (ie: a
test and production copy of a database)
What is the di5erence between a D- con0g and a C*$ 0le?
What about D% changes dynamically?
What is backward compatibility in abinitio?
What are kinds of layouts does ab initio supports
&ow do you add default rules in transformer?
&ave you used rollup component? Describe how
What are primary keys and foreign keys?
What is an outer 9oin?
What are Cartesian 9oins?
What is the purpose of having stored procedures in a database?
What is a cursor? Within a cursor2 how would you update 0elds on the row 9ust fetched?
&ow would you 0nd out whether a +/% "uery is using the indices you e!pect?
&ow can you force the optimi4er to use a particular inde!?
When using multiple D% statements to perform a single unit of work2 is it preferable to use implicit or e!plicit transactions2
and why
Describe the elements you would review to ensure multiple scheduled 6batch7 9obs do not 6collide7 with each other
What is semi39oin
&ow to get D% using tilities in ;,
8/20/2019 Ab-Initio Interview Ques
11/39
What is local and formal parameter
What is -RODC.+',;$ and R1#%,C.'1 ?
Explain what is lookup?
&ave you worked with packages?
&ow to create repository in abinitio for stand alone system(%OC.% ;')?
What is the di5erence between dbc and cfg 0le?
What does dependency analysis mean in .b ,nitio?
What do you have to give the value for the Record Re"uired parameter for a natural 9oin?
When do you use #artition by 1!pression?
What is .dhoc *ile +ystem? $ive me a scenario where you used it
What are the di5erent commands that you used when writing wrappers?
What do the hidden 0les in a sandbo! represent and what does startksh represent?
&ow can we test the abintio manually and automation?
What is the di5erence between sandbo! and 112 can we perform checkin and checkout through sandbo!8 Can anybody
e!plain checkin and checkout?
What does layout means in terms of .b ,nitio
What are di5erent things that you have to consider when loading data into a table?
&ow to Create +urrogate =ey using .b ,nitio?
Can anyone give me an e!aple of realtime start script in the graph?
What are di5erences between di5erent $D1 versions(>>2>>>2>>@2>>Aand >>B)? What are di5erences between di5erent
versions of Co3op?
Do you know what a local lookup is?
&ow many components in your most complicated graph?
&ow to handle if D% changes dynamically in abinitio
8/20/2019 Ab-Initio Interview Ques
12/39
1!plain what is lookup?
&ave you worked with packages?
&ow to run the graph without $D1?
What are the di5erent versions and releases of .-initio ($D1 and Co3op version)
What is the Di5erence between D% 1!pression and
8/20/2019 Ab-Initio Interview Ques
13/39
1!plain the di5erences between api and utility mode?
#lease let me know whether we have ab initio $D1 version >> and what is the latest $D1 version and Co3op version?
What are the $raph parameter?
&ow to 0nd the number of arguments de0ned in graph
What is the di5erence between rollup and scan?
&ow to work with parameteri4ed graphs?
#lease give us insight on 1nterprise eta 1nvironment2 and some possible "uestions on that
What are delta table and master table?
What error would you get when you use #artition by Round Robin and Foin?
Do you know what a local lookup is?
&ow many components in your most complicated graph?
&ow to handle if D% changes dynamically in abinitio
How do you count the number of records in a flat file?
&ow do you connect 11 to .binitio +erver?
&ave you eveer encountered an error called Gdepth not e"ualH? ('his occurs when you e!tensively create graphs it is a trick
"uestion)
What is the di5erence between a D- con0g and a C*$ 0le?
Do you know what a local lookup is?
What is the di5erence between look3up 0le and look3up2 with a relevant e!ample?
&ave you worked with packages?
,n which scenarios would you use #artition by =ey and also2 #artition by Round Robin and di5erences between the both?
What are the di5erent dimension tables that you used and some columns in the fact table?
What is the di5erence between a +can component and a Rollp component?
&ow do we handle if D% changing dynamicaly
8/20/2019 Ab-Initio Interview Ques
14/39
What is mIdump
What is the synta! of mIdump command?
&ave you used rollup component? Describe how
&ow do you improve the performance of a graph?
&ow many components are there in your most complicated graph?
What is the function you would use to transfer a string into a decimal?
*or data parallelism2 we can use partition components *or component parallelism2 we can use replicate component %ike
this which component(s) can we use for pipeline parallelism?
What is .-I%OC.% e!pression where do you use it in ab3initio?
What is mean by Co J Operating system and why it is special for .b3initio ?
&ow to retrive data from database to source in that case whice componenet is used for this?
&ow can you run a graph in0nitely?
What is the synta! of mIdump command?
&ow to do we run se"uences of 9obs 22 like output of . FO- is ,nput to - &ow do we co3ordinate the 9obs
&ow do you truncate a table?
What is a ramp limit?
What is the di5erence between dbc and cfg? When do you use these two?
What are the compilation errors you came across while e!ecuting your graphs?
What is depthIerror?
Di5erence between conventional loading and direct loading ? When it is used in real time
During the e!ecution of graph2 let us say you lost the network connection2 would you have to start the process all over
again or does it start from where it stopped?
What are the di5erent types of partitions and scenarios
What does dependency analysis mean in .b ,nitio?
8/20/2019 Ab-Initio Interview Ques
15/39
What does unused port in 9oin component do?
De0ne ulti 0le system Can you create multi0le system on the same server? .lso2 if you have a table that has ;ame2
.ddress2 +tatus2 #osition attributes2 can ;ame and .ddress be on one partition and +tatus and #osition in the other
partition?
What is a sandbo!? Did the co3operating system version @E have sandbo!2 if not how would you store the respective 0les?
&ow did you do version control? Which tool did you use?
&ow do you troubleshoot performance issues in graph?
What are the usual errors that you encounter during 1'% process apart from compilation process?
Were you involved in production support? What were the di5erent kinds of problems that you encountered?
&ow do you count the number of records in a multi0le system without using $D1?
What does +can and Rollup component do and give a scenario where you used them?
Did you ever used user de0ned functions or packages? ,f yes2 give a scenario
What is di5erence between Rede0ne *ormat and Reformat components?
+ometimes you have to use dynamic length strings Can you give me one circumstance where you need it?
Why might you create a stored procedure with the Gwith recompileH option?
&ow many parallelisms are in .binitio? #lease give a de0nition of each
&ow to +chedule $raphs in .b,nitio2 like workKow +chedule in ,nformatica? .nd where we must is ni! shell scripting in
.b,nitio?
&ow to ,mprove #erformance of graphs in .b initio? $ive some e!amples or tips
Ab Initio Questions and Answers:
•
1 :: What does dependency analysis mean in Ab Initio?
Dependency analysis will answer the questions regarding datalinage.That is where does the data come from,what applicationsprodeuce and depend on this data etc.
We can retriee the ma!imum "surrogate #ey$ from the e!isting data,the by using scan or ne!t%in%sequence&reformat we cangenerate further sequence for new records.
http://www.globalguideline.com/interview_questions/Answer.php?a=What_does_dependency_analysis_mean_in_Ab_Initio&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=What_does_dependency_analysis_mean_in_Ab_Initio&page=1
8/20/2019 Ab-Initio Interview Ques
16/39
0s 1his Answer orrect? , 2es > ;o
2 :: When using multiple DML statements to perform a single unit of wor! is it preferable to use implicit or
e"plicit transactions! and why?'ecause implicit is using for internal processing and e!plicit is using for user open data requied.
0s 1his Answer orrect? , 2es > ;o
# :: Describe the $rant%&e'oe DDL facility and how it is implemented?
'asically,This is a part of D.'.A responsibilities ()A*T means permissions for e!ample ()A*T +)AT TA'- ,+)AT IW A*D /A*0 /1) .
)12 means cancel the grant "permissions$.3o,(rant or )eo#e both commands depend upon D.'.A.
0s 1his Answer orrect? , 2es ;o
( :: What is the difference between rollup and scan?
'y using rollup we cant generate cumulatie summary records for that we will be using scan.
0s 1his Answer orrect? , 2es > ;o
) :: Describe the elements you would re'iew to ensure multiple scheduled batch *obs do not collide with
each other?
'ecause eery 4ob depend upon another 4ob for e!ample if you first 4ob result is successfull then another 4ob will e!ecuteotherwise your 4ob doesn5t wor#.
0s 1his Answer orrect? 3 2es > ;o
+ :: ,ow can i run the 2 $-I merge files?
Do you mean by merging (ui map files in W).If so, by merging (6I map files in (6I map editor it wont create corresponding testscript.without testscript you cant run a file.3o it is impossible to run a file by merging 7 (6I map files.
0s 1his Answer orrect? 3 2es > ;o
. :: Describe how you would ensure that database ob*ect definitions /0ables! Indices! onstraints! 0riggers!
-sers! Logins! onnection ptions! and 3er'er ptions etc4 are consistent and repeatable between multiple
database instances /i5e5: a test and production copy of a database4?
Ta#e an entire database bac#up and restore it in different instance.
http://www.globalguideline.com/interview_questions/Answer.php?a=When_using_multiple_DML_statements_to_perform_a_single_unit_of_work_is_it_preferable_to_use_implicit_or_explicit_transactions_and_why&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=When_using_multiple_DML_statements_to_perform_a_single_unit_of_work_is_it_preferable_to_use_implicit_or_explicit_transactions_and_why&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_Grant-Revoke_DDL_facility_and_how_it_is_implemented&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_the_difference_between_rollup_and_scan&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_elements_you_would_review_to_ensure_multiple_scheduled_batch_jobs_do_not_collide_with_each_other&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_elements_you_would_review_to_ensure_multiple_scheduled_batch_jobs_do_not_collide_with_each_other&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=How_can_i_run_the_2_GUI_merge_files&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=When_using_multiple_DML_statements_to_perform_a_single_unit_of_work_is_it_preferable_to_use_implicit_or_explicit_transactions_and_why&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=When_using_multiple_DML_statements_to_perform_a_single_unit_of_work_is_it_preferable_to_use_implicit_or_explicit_transactions_and_why&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_Grant-Revoke_DDL_facility_and_how_it_is_implemented&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_the_difference_between_rollup_and_scan&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_elements_you_would_review_to_ensure_multiple_scheduled_batch_jobs_do_not_collide_with_each_other&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_the_elements_you_would_review_to_ensure_multiple_scheduled_batch_jobs_do_not_collide_with_each_other&page=1http://www.globalguideline.com/interview_questions/Answer.php?a=How_can_i_run_the_2_GUI_merge_files&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2http://www.globalguideline.com/interview_questions/Answer.php?a=Describe_how_you_would_ensure_that_database_object_definitions&page=2
8/20/2019 Ab-Initio Interview Ques
17/39
Ta#e a statistics of all alid and inalid ob4ects and match.
8eriodically refresh
0s 1his Answer orrect? 3 2es ;o
6 :: ,ow would you find out whether a 37L 8uery is using the indices you e"pect?
!plain plan can be reiewed to chec# the e!ecution plan of the query. This would guide if the e!pected inde!es are used or not.
0s 1his Answer orrect? 3 2es ;o
9 :: ,ow to create repository in abinitio for stand alone system/LAL 04?
If you are trying to install the Ab 9Initio on stand alone machine , then it is not necessary to create the repository , While installing Itcreates automatically for you under abinitio folder " where you installing the Ab9Initio$ If you are still not clear please as# your Question on the same portal .
0s 1his Answer orrect? 3 2es ;o
1; :: When running a stored procedure definition script how would you guarantee the definition could be
rolled bac in the e'ent of problems?
There are quite a few factors that determines the approach such as what type of ersion control are used, what is the sie of thechange, what is the impact of the change, is it a new procedure or replacing an e!isting and so on.
If it is a new, then 4ust drop the wrong one
if it is a replacement then how big is the change and what will be the possible impact, depending upon you can hae the entire
database bac#ed up or 4ust create a script for your original procedure before messing it up or you 4ust do an ed and change the filebac# to original and reapply. you may rename the old procedure as old and then wor# on new and so on.
few issues to #eep in mind are synonyms, dependancies, grants, any 4ob calling the procedure at the time of change and so on. Innutshell, scenario can be aried and solution also can be aried.
11 ::
8/20/2019 Ab-Initio Interview Ques
18/39
alter table =table%name> moe =tablespace%name> this actiity reclaims the defragmented space in the table
analye table table%name compute statistics to capture the updated statistics.
7$)eorg could be done by ta#ing a dump of the table, truncate the table and import the dump bac# into the table.
0s 1his Answer orrect? 3 2es ;o
1# :: ,ow can you force the optimi=er to use a particular inde"?
6se hints &?@ =hint> ?&, these acts as directies to the optimier
0s 1his Answer orrect? 3 2es ;o
1( :: What is a cursor? Within a cursor! how would you update fields on the row *ust fetched?
The oracle engine uses wor# areas for internal processing in order to the e!ecute sql statement is called cursor.There are two typesof cursors li#e Implecit cursor and !plicit cursor.Implicit cursor is using for internal processing and !plicit cursor is using for user open for data required.
0s 1his Answer orrect? 3 2es ;o
1) :: Why might you create a stored procedure with the with recompile option?
)ecompile is useful when the tables referenced by the stored proc undergoes a lot of modification&deletion&addition of data. Due tothe heay modification actiity the e!ecute plan becomes outdated and hence the stored proc performance goes down. If we createthe stored proc with recompile option, the sql serer wont cache a plan for this stored proc and it will be recompiled eery time it isrun.
0s 1his Answer orrect? 3 2es ;o
Ab Initio Questions and Answers:
•
1+ :: What is the purpose of ha'ing stored procedures in a database?
/ain 8urpose of 3tored 8rocedure for reduse the networ# trafic and all sql statement e!ecuting in cursor so speed too high.
0s 1his Answer orrect? 3 2es ;o
1. :: What are artesian *oins?
A +artesian 4oin will get you a +artesian product. A +artesian 4oin is when you 4oin eery row of one table to eery row of another table. 0ou can also get one by 4oining eery row of a table to eery row of itself.
http://www.globalguideline.com/interview_questions/Answer.php?a=How_can_you_force_the_optimizer_to_use_a_particular_index&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_a_cursor_Within_a_cursor_how_would_you_update_fields_on_the_row_just_fetched&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=Why_might_you_create_a_stored_procedure_with_the_with_recompile_option&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_the_purpose_of_having_stored_procedures_in_a_database&page=4http://www.globalguideline.com/interview_questions/Answer.php?a=What_are_Cartesian_joins&page=4http://www.globalguideline.com/interview_questions/Answer.php?a=How_can_you_force_the_optimizer_to_use_a_particular_index&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_a_cursor_Within_a_cursor_how_would_you_update_fields_on_the_row_just_fetched&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=Why_might_you_create_a_stored_procedure_with_the_with_recompile_option&page=3http://www.globalguideline.com/interview_questions/Answer.php?a=What_is_the_purpose_of_having_stored_procedures_in_a_database&page=4http://www.globalguideline.com/interview_questions/Answer.php?a=What_are_Cartesian_joins&page=4
8/20/2019 Ab-Initio Interview Ques
19/39
0s 1his Answer orrect? 3 2es ;o
16 :: What is an outer *oin?
An outer 4oin is used when one wants to select all the records from a port 9 whether it has satisfied the 4oin criteria or not.
0s 1his Answer orrect? 3 2es ;o
19 :: What are primary eys and foreign eys?
In )D'/3 the relationship between the two tables is represented as 8rimary #ey and foreign #ey relationship.Wheras the primary#ey table is the parent table and foreign#ey table is the child table.The criteria for both the tables is there should be a matchingcolumn.
0s 1his Answer orrect? 3 2es ;o
2; :: ,a'e you used rollup component? Describe how?
If the user wants to group the records on particular field alues then rollup is best way to do that. )ollup is a multi9stage transformfunction and it contains the following mandatory functions.
8/20/2019 Ab-Initio Interview Ques
20/39
GB) represent the tranform functions.which will contain businessrules
0s 1his Answer orrect? 3 2es ;o
2. :: ,ow Does MA@&< wors?
/a!core is a alue "it will be in 2b$.Whne eer a component is e!ecuted it will ta#e that much memeory we specified for e!ecution
0s 1his Answer orrect? 3 2es ;o
26 :: What is the synta" of mdump command?
The genaral synta! is ;m%dump metadata data Haction ;
0s 1his Answer orrect?
3 2es ;o
29 :: an anyone gi'e me an e"aple of realtime start script in the graph?
Eere is a simple e!ample to use a start script in a graph:
In start script lets gie as:
e!port JDTKLdate 5@MmMdMy5L
*ow this ariable DT will hae today5s date before the graph is run.
*ow somewhere in the graph transform we can use this ariable asN
out.process%dt::JDTN
which proides the alue from the shell.
0s 1his Answer orrect? 3 2es ;o
#; :: What are differences between different $D< 'ersions/151;!1511!1512!151#and 151)4?
What are differences between different 'ersions of o>op?
8/20/2019 Ab-Initio Interview Ques
21/39
#1 :: ,ow to run the graph without $D
8/20/2019 Ab-Initio Interview Ques
22/39
22 :: What is ABLAL e"pression where do you use it in ab>initio?
ablocal%e!pr is a parameter of itable component of Ab Initio.A'-1+A-"$ is replaced by the contents of ablocal%e!pr.Which we canma#e use in parallel unloads.There are two forms of A'%-1+A-"$ construct, one with no arguments and one with single argumentas a table name"driing table$.
The use of A'%-1+A-"$ construct is in 3ome comple! 3Q- statements contain grammar that is not recognied by the Ab Initioparser when unloading in parallel. 0ou can use the A'-1+A-"$ construct in this case to preent the Input Table component from
parsing the 3Q- "it will get passed through to the database$. It also specifies which table to use for the parallel clause.
0s 1his Answer orrect? 3 2es ;o
2# :: What is the latest 'ersion that is a'ailable in Ab>initio?
The latest ersion of (D isminitio?
0ou can use Jmp4ret in endscript li#e
if O 9eq"Jmp4ret$
then
echo ;success;
else
mail! 9s ;Hgraphname failed; mailid
0s 1his Answer orrect? 3 2es ;o
2) :: I am unable to connect se'er database/oracle4 from $D
8/20/2019 Ab-Initio Interview Ques
23/39
0s 1his Answer orrect? 3 2es ;o
#. :: What is sew and sew measurement?
s#ew is the mesaureof data flow to each partation .
suppose i&p is comming from C files and sie is < gb
< gbK "
8/20/2019 Ab-Initio Interview Ques
24/39
• http233en.wi&ipedia.org3wi&i3Ab4Initio
• http233www.abinitio.com
• http233www.patents.com3Ab$Initio$Software$
Corporation3Lexington3MA35/65503company3
• http233www.bi$nerd.com3ab$initio$the$dar&$horse$of$etl3
• -atents2 7S889:0/;.pdf 7S;/:;5.pdf 7S;68::.pdf 7S;68;19/.pdf
• http233www.lin&edin.com3companies3ab$initio
Ab Initio is a private company its main offices are in Lexington Massach#setts )near +oston 7SA $ since 600:*b#t they have offices all over the world )as yo# can see on their web site*. They have very good talented devotedpeople. I%ve heard that when yo# are calling their c#stomer service $ there is a ;9< chance that yo# will spea&
with a -h.=.. It may very well be tr#e. The company was formed by former employees of the Thin&ing MachinesCorporation. Some &ey people2 Craig . Stanfill >ichard A. Shapiro Stephen A. ?#&olich.
Ab Initio also #ses its own people as well as independent cons#lting firms to b#ild proof of concept for a client andthen to g#ide clients in #sing their tools.
7nfort#nately Ab Initio provides very little information abo#t their sol#tions to general p#blic. So not getting intodetails most of AI f#nctionality can be scripted #sing several commands which yo# can give from prompt )withmany options*2
• m_* commands ) for example m4sh#tdown m4m&fs m4cp etc. * are #sed for
administering
• mp ... )some options* $ to define establish and r#n @obs
• air ... )some options* $ to wor& with 'M' )basically a specialied version controlsystem*
The scripts can be easily integrated to wor& with external sched#lers.
Somewhere B600; Ab Initio has introd#ced raphical =evelopment 'nvironment $ a very powerf#l des&topsoftware. (o# place components on the screen connect them define what they do and how. So yo#r application isa graph. (o# can create components which consist of other components which consist of other components etc. $
so effectively yo# can drill deeply into the diagram. I%ve seen this tool generating powerf#l data processingapplication in less than 6/ min#tes. (o# can r#n the application right from the I=' or save it as a set of scripts)&sh for #nix*. The scripts will call misc. component libraries. The libraries are written in CDD.
Some of the &ey elements of the system2
• "CoE,perating System"
• "Component Library"
• "raphical =evelopment 'nvironment" )='*
• "'nterprise MetaE'nvironment" )'M'*
http://en.wikipedia.org/wiki/Ab_Initiohttp://www.abinitio.com/http://www.patents.com/Ab-Initio-Software-Corporation/Lexington/MA/301339/company/http://www.patents.com/Ab-Initio-Software-Corporation/Lexington/MA/301339/company/http://www.bi-nerd.com/ab-initio-the-dark-horse-of-etl/http://www.selectorweb.com/AbInitio/US6654907.pdfhttp://www.selectorweb.com/AbInitio/US7047232.pdfhttp://www.selectorweb.com/AbInitio/US7164422.pdfhttp://www.selectorweb.com/AbInitio/US7167850.pdfhttp://www.linkedin.com/companies/ab-initiohttp://www.inc.com/magazine/19950915/2622.htmlhttp://www.inc.com/magazine/19950915/2622.htmlhttp://www.inc.com/magazine/19950915/2622.htmlhttp://en.wikipedia.org/wiki/Ab_Initiohttp://www.abinitio.com/http://www.patents.com/Ab-Initio-Software-Corporation/Lexington/MA/301339/company/http://www.patents.com/Ab-Initio-Software-Corporation/Lexington/MA/301339/company/http://www.bi-nerd.com/ab-initio-the-dark-horse-of-etl/http://www.selectorweb.com/AbInitio/US6654907.pdfhttp://www.selectorweb.com/AbInitio/US7047232.pdfhttp://www.selectorweb.com/AbInitio/US7164422.pdfhttp://www.selectorweb.com/AbInitio/US7167850.pdfhttp://www.linkedin.com/companies/ab-initiohttp://www.inc.com/magazine/19950915/2622.htmlhttp://www.inc.com/magazine/19950915/2622.html
8/20/2019 Ab-Initio Interview Ques
25/39
• "=ata -rofiler"
• "Cond#ctEIt"
Main power of Ab Initio $ parallelism $ is achieved via its "CoE,perating System" which provides the facilities for"parallel exec#tion )m#ltiple C-7s and3or m#ltiple boxes* platform independent data transport chec& pointing and
process monitoring. A lot of attention is devoted to monitoring reso#rces )C-7 memory*. m#lti$file m#lti$directory.
Component Library $ a set of software mod#les to perform sorting data transforming and high speed data loadingand #nloading tas&s.
Ab Initio tools incorporate best practices s#ch as chec&$pointing rer#nnability tagging everything with #niF#e Id$s etc.
7nfort#nately Ab Initio doesn%t advertise or p#blish any information. So there are @#st bits and pieces here andthere. Gere is an interesting blog2
• http233www.gee&interview.com3Interview$H#estions3=ata$areho#se3Abinitio
6
H#estion
Answer
-hases vs
Chec&points
-hases $ are #sed to brea& the graph into pieces. Temporary files createdd#ring a phase will be deleted after its completion. -hases are #sed to
effectively separately manage reso#rce$cons#ming )memory C-7 dis&*parts of the application.
Chec&points $ created for recovery p#rposes. These are points whereeverything is written to dis&. (o# can recover to the latest saved point $ and
rer#n from it.
(o# can have phase brea&s with or witho#t chec&points.
xfr
A new sandbox will have many directories2 mp dml xfr db ... . xfr is adirectory where yo# p#t files with extension .xfr containing yo#r own
c#stom f#nctions )and then #se 2 incl#de "somepath3xfr3yo#rfile.xfr"*.
7s#ally JK> stores mapping.
threetypes of
parallelism6* =ata -arallesim $ data )partitionning of data into parallel streams forparallel processing*.
* Componnent -aralelism )exec#te sim#ltaneo#sly on different branches of
http://www.geekinterview.com/Interview-Questions/Data-Warehouse/Abinitiohttp://www.geekinterview.com/Interview-Questions/Data-Warehouse/Abinitio
8/20/2019 Ab-Initio Interview Ques
26/39
the graph*
5* -ipeline )seF#ential*.
MKS
M#lti$Kile System
m4m&fs $ create a m#ltifile )m4m&fs ctrlfile mpfile6 ... mpfile*
m4ls $ list all the m#ltifilesm4rm $ remove the m#ltifile
m4cp $ copy a m#ltifile
m4m&dir $ to add more directories to existing directory str#ct#re
MemoryreF#ireme
nts of agraph
• 'ach partition of a component #ses2 B 1 M+ D max$core )if any*
• Add sie of loop files #sed in phase )if m#ltiple components #se
same loop only co#nt it once*
• M#ltiply by degree of parallelism. Add #p all components in a phase
that is how m#ch memory is #sed in that phase.
• Select the largest$memory phase in the graph
Gow tocalc#late a
S7M
SCA>,LL7-
SCAITG>,LL7-
Scan followed by =ed#p sort and select the last
ded#p sort
with n#ll&ey
If we don%t #se any &ey in the sort component while #sing the ded#p sort
then the o#tp#t depends on the &eep parameter.
• first $ only the first record
• last $ only last record
• #niF#e4only $ there will be no records in the o#tp#t file.
@oin onpartitioned
file6 )A+C* file )A+=*. e partition both files by "A" and then @oin by"A+". IS it ,?! ,r sho#ld we partition by "A+" ! ot clear.
8/20/2019 Ab-Initio Interview Ques
27/39
flow
chec&in
chec&o#t
(o# can do chec&in3chec&o#t #sing the wiard right from the =' #sing
versions and tags
how to
havedifferentpasswords
for HA andprod#ction
parameterie the .dbc file $ or #se environmental variable.
Gow to get
records
9/$;9 o#tof 6//
• #se scan and filter
• m4d#mp NdmlE Nmfs fileE $start 9/ $end ;9
• #se next4in4seF#ence)* f#nction and filter by expression component
)next4in4seF#ence)* E9/ OO next4in4seF#ence)* N;9*
Got toconvert a
serial fileinto KKS
create MKS then #se partition component
pro@ectparameter
s vs.sandbox
parameters
hen yo# chec& o#t a pro@ect into yo#r sandbox $ yo# get pro@ectparameters. ,nce in yo#r sandbox $ yo# can refer to them as sandbox
parameters.
+ad$Straight$
flow
error yo# get when connecting mismatching components )for exampleconnecting serial flow directly to mfs flow witho#t #sing a partition
component*
merging
graphs
(o# can not merge two ab initio graphs. (o# can #se the o#p#t of one graphas inp#t for another. (o# can also copy3paste the contents between graphs.
See also abo#t #sing .plan
partitioning re$
partitioning
departitioning
• partitioning $ dividing a single flow of records)serial file mfs* into
m#ltiple flows.
• departitioning $ removing partitionning )gather an merge
component*
8/20/2019 Ab-Initio Interview Ques
28/39
• re$partitioning $ change the n#mber of partitions )eg from to :
flows*
loop file for large amo#nts of data #se MKS loop file )instead of serial*
indexingo indexes as s#ch. +#t there is an "o#tp#t indexing" #sing reformat anddoing necessary coding in transform part.
'nvironment pro@ect
'nvironment pro@ect $ special p#blic pro@ect that exists in every Ab Initio
environment. It contains all the environment parameters reF#ired by theprivate or p#blic pro@ects which constit#te AI Standard 'nvironment.
Aggregatevs >oll#p
Aggregate $ old component
>oll#p $ newer extended recommended to #se instead of Agregate.)b#ilt$in f#nctions li&e s#m co#nt avg min max prod#ct ...*
'M' ='
Co$
operatingsytem
• 'M' 'nterprise Metdata 'nvironment. K#nctions )repository
version control statistical analysis dependency analysis*. It is on
the server side and holds all the pro@ects )metadata oftransformations config info so#rce and target info2 graph dml xfr
&sh sFl etc..*. This is where yo# chec&in3chec&o#t. 3-ro@ect dir of'M' contains common directories for all application sandboxes
connected to it. It also helps in dependency analysis of codes. Ab
Initio has series of air commands to manip#late repository ob@ects.
• =' raphical =evlopment 'nvironment )on the client box*
•
Co$operating sytem Ab Initio server installed on top of native)#nix* os on the server
fencing
fencing means @ob controlling on priority basis.
In AI it act#ally refers to c#stomied phase brea&ing. A well fenced graphmeans no matter what is so#rce data vol#me process will not co#gh in dead
loc&s. It act#ally limits the n#mber of sim#ltaneo#s processes.
Kencing $ changing a priority of a @ob
-hasing $ managing the reso#rces to avoid deadloc&s.Kor example limiting the n#mber of sim#ltaneo#s processes
)by brea&ing the graph into phases only 6 of which can r#n at any giventime*
Contin#o#s
components
Contin#o#s components $ prod#ce #sef#l o#tp#t file while r#nning
contino#sly. Kor example Contin#o#s roll#p Contin#o#s #pdate batchs#bscribe
8/20/2019 Ab-Initio Interview Ques
29/39
H#estionAnswer
deadloc&=eadloc& is when two or more processes are reF#esting the same reso#rce.To avoid #se phasing and reso#rce pooling.
environment
• A+4G,M' $ where coEoperating system is installed
• A+4AI>4>,,T $ defa#lt location for 'M' datastore
• sandboxes standard environment
• AI4S,>T4MAJ4C,>' AI4G,M' AI4S'>IAL AI4MKS etc.
• from #nix prompt2 env P grep AI
wrapperscript
#nix script to r#n graphs
m#ltistag
ecompone
nt
A m#ltistage component is a component which transforms inp#t records in 9stages )6.inp#t select .temporary initialiation 5.processing :. o#tp#t
selection 9.finalie*. So it is a transform component which has pac&ages.'xamples2 scan ormalie and =enormalie roll#p scan normalie and
denormalie sorted.
=ynamic
=ML
=ynamic =ML is #sed if the inp#t metadata can change. 'xample2 atdifferent time different inp#t files are recieved for processing which have
different dml. in that case we can #se flag in the dml and the flag is first
read in the inp#t file recieved and according to the flag its correspondingdml is #sed.
fan in fano#t
• fan o#t $ partition component )increase parallelism*
• fan in departition component )decrease parallelism*
loc&a #ser can loc& the graph for editing so that others will see the message and
can not edit the same graph.
@oin vs
loop
Loop is good for spped for small files )will load whole file in memory*. Korlarge files #se @oin. (o# may need to increase the maxcore limit to handle
big @oins.
8/20/2019 Ab-Initio Interview Ques
30/39
m#lti
#pdate
m#lti #pdate exec#tes SHL statements $ it treats each inp#t record as a
completely separate piece of wor&.
sched#ler
• e can #se A#tosys Control$M or any other external sched#ler.
• e can ta&e care of dependencies in many ways. Kor example if
scripts sho#ld r#n seF#entially we can arrange for this in A#tosys or
we can create a wrapper script and p#t there several seF#entialcommands )noh#p command6.&sh O noh#p command.&sh O etc*.
e can even create a special graph in Ab Initio to exec#te individ#alscripts as needed.
Api and7tilitymodes in
inp#t
table
These are database interfaces )api $ #ses SHL #tility $ b#l& loads whatevervendor provides*
loop file
• loop file component. K#nctions2 loop loop4co#nt
loop4next loop4match loop4local.
• Loops are always #sed with combination of the reformat
components.
Callingstored
proc in
=+
(o# can call stored proc )for example from inp#t component*. In fact yo#can even write S- in Ab Initio. Ma&e it "with recompile" to ass#re good
performance.
KreF#ently #sed
f#nctions
string4ltrim string4lrtrim string4s#bstring reinterpret4as today)* now)*
data
validationis4valid is4n#ll is4blan& is4defined
driving
port
hen @oining inp#ts )in/ in6 ...* one of the ports is #sed as "driving )bydefa#lt $ in/*. =riving inp#t is #s#ally the largest one. hereas the smallest
can have "Sorted$Inp#t" parameter be set to "Inp#t need not be sorted"beca#se it will be loaded completely in memory.
Ab Initiovs Ab Initio benefits2 parallelism b#ilt in m#litifile system handles h#ge
8/20/2019 Ab-Initio Interview Ques
31/39
Informatica for 'TL
amo#nts of data easy to b#ild and r#n. enerates scripts which can be
easily modified as needed *if something co#ldn%t be done in 'TL tool itself*.The scripts can be easily sched#led #sing any external sched#ler $ and easily
integrated with other systems.
Ab Initio doesn%t reF#ire a dedicated administrator.
Ab Initio doesn%t have b#ilt$in C=C capabilities )C=C Change =ataCapt#re*.
Ab Initio allows to )attach error 3 re@ect files* to each transformation andcapt#re and analye the message and data separately )as opposed to
Informatica which has @#st one h#ge log*. Ab Initio provides immediatemetrics for each component.
override&ey
override &ey option is #sed when we need to @oin fields which havedifferent field names.
controlfile
control file sho#ld be in the m#ltifile directory )contains the addresses of theserial files*
max$core
max$core parameter )for example sort 6// M+ytes* specifies the amo#nt of memory #sed by a component )li&e Sort or >oll#p* $ per partition $ before
spilling to dis&. 7s#ally yo# don%t need to change it $ @#st #se defa#lt val#e.Setting it too high may degrade the performance beca#se of ,S swapping
and degrading of the performance of other components.
Inp#t
-arameters
graph E select parameters tab E clic& "create" $ and create a parameter.
7sage2 Qparamname. 'dit E parameters. These parameters will bes#bstit#ted d#ring r#n time. (o# may need to declare yo# parameter scope
as formal.
'rror
Trapping
'ach component has reject error and log ports. >e@ect capt#res re@ected
records 'rror capt#res corresponding error and log capt#res the exec#tionstatistics of the component. (o# can control re@ect stat#s of each component
by setting re@ect threshold to either ever Abort Abort on first re@ect orsetting ramp3limit. (o# can also #se force4error)* f#nction in transform
f#nction.
5
H#estionAnswer
Gow to see
reso#rce #sage
In =' goto options Riew E Trac&ing =etails $ will see each
component%s C-7 and memory #sage etc.
assign &eys 'asy and saves development time. eed to #nderstand how to feed
8/20/2019 Ab-Initio Interview Ques
32/39
component parameters and yo# can%t control it easily.
oin in =+ vs
@oin in Ab Initio
• Scenario 6 )preferred*2 we r#n F#ery which @oins tables in =+
and gives #s the res#lt in @#st 6 =+ component.
• Scenario )m#ch slower*2 we #se database components
extract all data $ and @oin them in Ab Initio.
oin with =+not recommended if n#mber of records is big. It is better to retrievethe data o#t $ and then @oin in Ab Initio.
=ata S&ew
-arameter showing how data is #nevenly distrib#ted between
partitions.
s&ew )partition sie $ avg.part.sie* 6// 3 )sie of the largestpartition*
dbc vs cfg
.dbc $ database config#ration file )dbname nodes version #ser3pwd* $
resides in the db directory
.cfg $ any tyoe of config file. for example remote connection config)name of remote server #ser3pwd to connect to db location of ,S on
remote machine connection method*. .cfg file resides in the config dir.
compilationerrors
depth not eF#al data format error etc...
depth error 2 we get this error.. when two components connectedtogether b#t does%t match there layo#t
types ofpartitions
broadcast pbyexpression pbyro#ndrobin pby&ey pwithloadbalance
#n#sed portwhen @oining #sed records go to the o#tp#t port #n#sed records $ tothe #n#sed port
t#ningperformance • o parallel #sing partitionning. >o#ndrobin partitionning gives
good balance.
• 7se M#lti$file system )MKS*.
• 7se Ad Goc MKS to read many serial files in parallel and #se
8/20/2019 Ab-Initio Interview Ques
33/39
concat component.
• ,nce data is partitionned $ do not switch it to serial and bac&.
>epartition instead.
• =o not acceess large filess via KS $ #se KT- instead
• #se loop local rather than loop )especially for big loops*.
• 7se roll#p and Kilter as soon as possible to red#ce n#mber of
records. Ideally do it in the so#rce )database !* before yo# get
the data.
• >emove #nnecessary components. Kor example instead of
#sing filter by exp yo# can implement the same f#nction inreformat3oin3>oll#p. Another example $ when @oining data from
files #se #nion f#nction instead of adding an additionalcomponent for removing d#plicates.
• #se gather instead of concatenate.
• it is faster to do a sort after a partitino than to do a sort before
a partition.
• try to avoid #sing a @oin with the "db" component.
• when getting data from database $ ma&e s#re yo#r F#eries are
fast )#se indexes etc.*. If possible do necessary selection 3
aggregation 3 sorting in the database before getting data intoAb Initio.
• t#ne Max4core for ,ptimal performance )for sort depends on
the sie of the inp#t file*.
• ote $ If in$memory @oin cannot fit its non$driving inp#ts in the
provided MAJ$C,>' then it will drop all the inp#ts to dis& and
in$memory does not ma&e sence.
• 7sing phase brea&s let yo# allocate more memory in individ#al
components $ th#s improving performance.
• 7se chec&point after sort to land data on dis&
• 7se oin and roll#p in$memory feat#re
• hen @oining very small dataset to a very large dataset it is
more efficient to broadcast the small dataset to MKS #sing
8/20/2019 Ab-Initio Interview Ques
34/39
broadcast component or #se the small file as loop. +#t for
large dataset don%t #se broadcast as a partitioner.
• 7se Ab Initio layo#t instead of database defa#lt to achieve
parallel loads
• Change A+4>'-,>T parameter to increased monitoring d#ration
• 7se catalogs for re#sability
• Components li&e @oin3 roll#p sho#ld have the option "Inp#t m#st
be sorted"if they are placed after a sort component.
• minimie n#mber of sort components. Minimie #sage of sorted
@oin component and if possible replace them by in$memory
@oin3hash @oin. 7se only reF#ired fields in the sort reformat @oincomponents. 7se "Sort within ro#ps" instead of @#st Sort when
data was already presorted.
• 7se phasing3flow b#ffers in case of merge sorted @oins
• Minimie the #se of reg#lar expression f#nctions li&e re4index in
the transfer f#nctions
• Avoid repartitioning of data #nnecessarily. hen splitting
records into more than two flows #se >eformat rather than
+roadcast component.
• Kor @oining records from flows #se Concatenate component
,L( when there is a need to follow some specific order in @oining records. If no order is reF#ired then it is preferable to
#se ather component.
• Instead of p#tting many >eformat components consec#tively
#se o#tp#t indexes parameter in the first >eformat componentand mention the condition there.
delta table
• =elta table maintain the seF#encer of each data table.
• Master )or base* table $ a table on tp of which we create a view
scan vs roll#proll#p $ performs aggregate calc#lations on gro#ps scan $ calc#latesc#m#lative totals
8/20/2019 Ab-Initio Interview Ques
35/39
pac&ages #sed in m#ltistage components or transform components
>eformat vs">edefine
Kormat"
• >eformat $ deriving new data by adding3dropping fields
•
>edefine format $ rename fields
Conditional=ML
=ML which is separated based on a condition
S,>TITGI>,7-
• The prereF#isit for #sing sortwithingro#p is that the data is
already sorted by the ma@or &ey. sortwithingro#p o#tp#ts the
data once it has finished reading the ma@or &ey gro#p. It is li&ean implicit phase.
passing a
condition as a
parameter
=efine a Kormal ?eyword -arameter of type string. Kor example yo#call it KilterCondition and yo# want it to do filtering on C,7T E / .
Also in yo#r graph in yo#r "Kilter by expression" Component enterfollowing condition2 QKilterCondition
ow on yo#r command line or in wrapper script give the followingcommand
(o#rraphname.&sh $KilterCondition C,7T E /
-assing file
name as aparameter
#!/bin/ksh
#Running the set up script on enviornmenttypeset PROJ_DIR $(cd $(dirnme $"/ p%d" $PROJ_DIR/b_pro&ect_setupksh $PROJ_DIR#'porting the script prmeter) to I*P+,_-I.'_*0'i1 2 $# 3ne 4 5then I*P+,_-I.'_PR0','R_) $) I*P+,_-I.'_PR0','R_4 $4 # ,his grph is using the input 1i6e cd $I_R+* /my_grph)ksh $I*P+,_-I.'_PR0','R_) # ,his grph 6so is using the input 1i6e /my_grph4ksh $I*P+,_-I.'_PR0','R_4 eit e6se echo Insu11icient prmeterseit )
1i3333333333333333333333333333333333333#!/bin/ksh
#Running the set up script on enviornmenttypeset PROJ_DIR $(cd $(dirnme $"/ p%d"
8/20/2019 Ab-Initio Interview Ques
36/39
$PROJ_DIR/b_pro&ect_setupksh $PROJ_DIR
#'porting the script prmeter) to I*P+,_-I.'_*0'eport I*P+,_-I.'_*0' $)
# ,his grph is using the input 1i6e
cd $I_R+*/my_grph)ksh
# ,his grph 6so is using the input 1i6e/my_grph4ksh
eit
Gow to remove
header and
trailer lines!
#se conditional dml where yo# can separate detail from header and
trailer. Kor validations #se reformat with co#nt 25 )o#t/2header
o#t62detail o#t2trailer.*
Gow to createa m#lti file
system onindows
• first method2 in =' go to >7 E 'xec#te Command $ and r#n
m4m&fs c2control c2dp6 c2dp c2dp5 c2dp:
• second method2 do#ble$clic& on the file component and in ports
tab do#ble$clic& on partitions $ there yo# can enter the n#mberof partitions.
RectorA vector is simply an array. It is an ordered set of elements of the
same type )type can be any type incl#ding a vector or a record*.
=ependency
Analysis
=ependency analysis will answer the F#estions regarding datalinagethat is where does the data come from what applications prode#ce and
depend on this data etc..
:
H#estionAnswer
S#rrogate
&ey There are many ways to create a s#rrogate &ey. Kor example yo# can
#se next_in_sequence() f#nction in yo#r transform. ,r yo# can #se"Assign key values" component. ,r yo# can write a stored proced#re $ and
call it.
ote2 if yo# #se partitions then do something li&e this2
8/20/2019 Ab-Initio Interview Ques
37/39
)next4in4seF#ence)*$6*no4of4partition)*Dthis4partition)*
.abinitiorcThis is a config file for ab initio $ in #ser%s home directory and inQA+4G,M'3Config. It sets abinitio home path config#ration variables
)A+4,>?4=I> A+4=ATA4=I> etc.* login info )id encrypted password*login methods for hosts for exec#tion )li&e 'M' host etc.* etc.
.profileyo#r &sh init file ) environment aliases path variables history file settingscommand prompt settings etc.*
data
mapping
datamodelling
Gwo toexec#te
the graph
Krom =' $ whole graph or by phases. Krom chec&point. Also #sing &sh
scripts
rite
M#ltiplefiles
A component which allows to write sim#ltaneo#sly into m#ltiple local files
Testing >#n the graph $ see the res#lts. 7se components from Ralidate category.
Sandbox
vs 'M'
Sandbox is yo#r private area where yo# develop and test. ,nly one pro@ectand one version can be in the sandbox at any time. The !"atastorecontains all versions of the code that have been chec&ed into it)so#rce control*.
Layo#t
here the data$files are and where the components are r#nning. Kor
example for data $ serial or partitioned )m#lti$file*. The layo#t is defined bythe location of the file )or a control file for the m#ltifile*. In the graph the
layo#t can propagate a#tomatically )for m#ltifile yo# have to providedetails*.
Latest
versionsApril //02 =' ver.6.69.8 Co$operative system ver .6:.
raphparamete
rs
men# edit E parameters $ allows yo# to specify private parameters for the
graph. They can be of types $ local and formal.
-lanEIt(o# can define pre$ and post$processes triggers. Also yo# can specifymethods to r#n on s#ccess or on fail#re of the graphs.
8/20/2019 Ab-Initio Interview Ques
38/39
KreF#entl
y #sed
components
• inp#t file 3 o#tp#t file
• inp#t table 3 o#tp#t table
• loop 3 loop4local
• reformat
• gather 3 concatenate
• @oin
• r#nsFl
• @oin with db
• compression components
• filter by expression
• sort )single or m#ltiple &eys*
• roll#p
• trash
• partition by expression 3 partition by &ey
r#nning
on hosts
coEoperating system is layered on top of native ,S )#nix*. hen r#nning
from =' =' generates a script )according to "r#n" setings*. CoEop
system will exec#te the scripts on different machines )#sing specified hostsettings and connection methods li&e rexec telnet rsh rlogin* $ and then
ret#rn error or s#ccess codes bac&.
conventio
nal
loading vsdirectloading
This is basically an ,racle F#estion $ regarding SHLL=> )SHL Loader* #tility.Conventional load $ #sing insert statements. All triggers will fire all
contraints will be chec&ed all indexes will be #pdated.
=irect load $ data is written directly bloc& by bloc&. Can load into specific
partition. Some constraints are chec&ed indexes may be disabled $ need to
specify native options to s&ip index maintenance.
semi$@oin
abinitio online help gives 5 examples of @oins2 inner @oin o#ter @oin and
8/20/2019 Ab-Initio Interview Ques
39/39
semi @oin.
• for inner @oin %record4reF#ired% parameter is tr#e for all "in" ports.
• for o#ter @oin it is false for all the "in" ports.
• for semi @oin it is tr#e for both port )li&e Inneroin* b#t the ded#p
option is set only on one side