Datastage Info

  • View

  • Download

Embed Size (px)




Complex Flat File Stages

Complex Flat File StagesThe Complex Flat File stage lets you convert data extracted from complex flat files that are generated on an IBM mainframe. A complex flat file has hierarchical structure in its arrangement of columns. It is physically flat (that is, it has no pointers or other complicated infrastructure), but logically represents parent-child relationships. You can use multiple record types to achieve this hierarchical structure.RECOGNIZING A HIERARCHICAL STRUCTURE

For example, use records with various structures for different types of information, such as an 'E' record for employee static information, and a 'S' record for employee monthly payroll information, or for repeating groups of information (twelve months of revenue). You can also combine these record groupings, and in the case of repeating data, you can flatten nested OCCURS groups.MANAGING REPEATING GROUPS AND INTERNAL STRUCTURES

You can easily load, manage, and use repeating groups and internal record structures such as GROUP fields and OCCURS. You can ignore GROUP data columns that are displayed as raw data and have no logical use for most applications. The metadata can be flattened into a normalized set of columns at load time, so that no arrays exist at run time.SELECTING SUBSETS OF COLUMNS

You can select a subset of columns from a large COBOL File Description (CFD). This filtering process results in performance gains since the stage no longer parses and processes hundreds of columns if you only need a few. Complex flat files can also include legacy data types.OUTPUT LINKS

The Complex Flat File stage supports multiple outputs. An output link specifies the data you are extracting, which is a stream of rows to be read. When using the Complex Flat File stage to process a large number of columns, for example, more than 300, use only one output link in your job. This dramatically improves the performance of the GUI when loading, saving, or building these columns. Having more than one output link causes a save or load sequence each time you change tabs. The Complex Flat File stage does not support reference lookup capability or input links.Posted by lokeb4u at 11:23 AM 1 comments

Email ThisBlogThis!Share to TwitterShare to Facebook

Configuring the dsenv fileThe dsenv file contains a series of shell arguments that are used when the engine starts. Interactive users, other programs, and scripts can use the dsenv file. For some ODBC connections, plug-ins, and connectors, and for interactions with external applications such as IBM WebSphere MQ, you must add environment variables to enable interactive use of ODBC drivers to make a connection to an ODBC data source. This lets IBM InfoSphere DataStage inherit the correct environment for ODBC connections.BEFORE YOU BEGIN

You must be logged in as an InfoSphere DataStage administrator with the operating system credentials on the server for the InfoSphere DataStage administrator. Back up the dsenv file before you edit it. For a connection that uses a wire protocol driver, you do not have to modify thedsenv file.PROCEDURE

1. Edit the dsenv file. The file is located in $DSHOME/DSEngine. $DSHOME identifies the InfoSphere DataStage installation directory. The default directory is/opt/IBM/InformationServer/Server/DSEngine. 2. Specify the following information in the dsenv file:

Environment variables for the database client software Database home location Database library directory

Table 1. Names of the library path environment variable, by operating system Operating system IBM AIX HP-UX HP-UX on Intel Itanium Linux Solaris Library path environment variable LIBPATH SHLIB_PATH LD_LIBRARY_PATH LD_LIBRARY_PATH LD_LIBRARY_PATH

3. The following examples show typical entries for commonly used databases. The entries vary slightly depending on your operating system. See the data source documentation for more information. 4. Sybase 11

5. LANG= export LANG SYBASE=/export/home/sybase/sybase export SYBASE LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SYBASE/lib:/usr/lib:/lib export LD_LIBRARY_PATH6. Informix XPS 9.3


9. DB2DIR=/opt/IBM/db2/V9.5 export DB2DIR DB2INSTANCE=db2inst1 export DB2INSTANCE INSTHOME=/export/home/db2inst1 export INSTHOME PATH=$PATH:$INSTHOME/sqllib/bin:$INSTHOME/sqllib/adm: $INSTHOME/sqllib/misc export PATH LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$INSTHOME/sqllib/lib export LD_LIBRARY_PATH THREADS_FLAG=native export THREADS_FLAG

10. 11. engine.

Save your changes. Stop and restart the IBM InfoSphere Information Server engine. See Stopping and starting the server

Posted by lokeb4u at 11:09 AM 0 comments Email ThisBlogThis!Share to TwitterShare to Facebook

How to Run job from command line?Below given are syntax for running datastge jobs through command line. Command Syntax: dsjob [-file | [-server ][-user ][-password ]] [] Valid primary command options are: -run -stop -lprojects -ljobs -linvocations -lstages -llinks -projectinfo -jobinfo -stageinfo -linkinfo -lparams -paraminfo -log -logsum -logdetail -lognewest -report -jobid -import Status code = -9999 DSJE_DSJOB_ERROR dsjob -run [-mode ] [-param =] [-warn ] [-rows ] [-wait] [-opmetadata ] [-disableprjhandler] [-disablejobhandler] [-jobstatus] [-userstatus] [-local] [-useid]

Status code = -9999 DSJE_DSJOB_ERROR ************************************

Its now quite sometime we have moved to parallel jobs with IBM DataStage 8.0.2. dsjob command has helped me writing some scheduling scripts. One more point I would like to put, that is to run dsenv and then run dsjob command . $DSHOME/dsenv

$DSHOME/bin/dsjob -run And if you have any parameters passed to jobs that job you can pass it with additional parameter list.

*************** using pmcmd command we can run the mapping for command prompt *******************

-run [ -mode ] [ -param ] [ -warn ] [ -rows ] [ -wait ] [ -stop ] [ jobstatus ] [ -userstatus ] project jobPosted by lokeb4u at 11:04 AM 0 comments Email ThisBlogThis!Share to TwitterShare to FacebookSATURDAY, JANUARY 29, 2011

DataStage Job run from Unix Command LineI am running DataStage Job from Unix Command Line with job level parameters the job is getting abort , can someone correct if there is any syntax problem in the below dsjob command. dsjob -run \ -param "TargetFileDirectory=/datahub/dev/wc/target"\ -param "SCRIPTS_DIR=/datahub/dev/wc/scripts"\ -param "Filename=GYLD090504"\ DEV_WC jbWCPatternIdentifyReplace_bkupON_12May2009 I am able to execute successfully the same job by hardcoding the param values in job and running using below command. dsjob -run DEV_WC jbWCPatternIdentifyReplace_bkupON_12May2009 ---------------------------------------------------------------------

By Keeping the space infront of \ also did not worked but with the below syntax it worked... dsjob -run -param "TargetFileDirectory=/datahub/dev/wc/target" -param "SCRIPTS_DIR=/datahub/dev/wc/scripts" -param "Filename=GYLD090504" -jobstatus DEV_WC jbWCPatternIdentifyReplace_bkupON_12May2009Posted by lokeb4u at 4:57 AM 1 comments Email ThisBlogThis!Share to TwitterShare to Facebook

How can I identify the duplicate rows in a seq or comma delimited file?How can I identify the duplicate rows in a seq or comma delimited file? the case is...> the source has 4 values like, agent id, agent name, etc... our requirement is that the ID shouldn't be repeated. so how can i identify the duplicate rows , set a flag and send the rejects to the specified reject file? the source systems data is directly given to us. tha's why we are getting these duplicates.if it has a primary key set up already then it would have been very easy.thanks in advance.? Ans:Sort the sequential file based on the key AGENT_ID adn set the option "Create Key Change Column" to TRUE in the sort stage. The records which has the duplicate records will be populated with the value 0(Zero) in the KeyChange field. Now reject the records which has the value 0.Posted by lokeb4u at 4:33 AM 0 comments Email ThisBlogThis!Share to TwitterShare to Facebook

faq Datastage Faqs

Types Of Lookups: in lookup first link is 'primary link'. Other links are called "lookup' links. when lookup links are from a stage that is other than a database stage, all data from the lookup link is read into memory. Then, for each row from the primary link, the lookup is performed.

if the source of lookups is a database. then can be two types of lookups; those are Normal and spares lookup Normal Lookup--- All the data form the database is read into memory, and then lookup is perfromed. sparese Lookup--- For each incoming row from the primary link, the sql is fired on daabase at run time.1.What is a view? View is the vertual or logical or the duplicate copy of the original table with schema parts only. 2)Difference b/w materialized view and view? View is nothing but a set a sql statements together which join single or multiple tables and shows the data .. however views do not have the data themselves but point to the data . Whereas Materialized view is a concept mainly used in Datawarehousing .. these views contain the data itself .Reason being it is easier/faster to access the data.The main purpose of Materialized view is to do calculations and display data from multiple tables using joins . 3)How to remove duplicates in ds and how it keep? Other than remove duplicate stage ,we can also use aggregator stage to count the number of records exist for the key columns.If more