ETL Life CycleThe typical real-life ETL cycle consists of the
following execution steps:
1. Cycle initiation
2. Build reference data
3. Extract (from sources)
5. Transform (clean, apply business rules, check for data
integrity, create aggregates or disaggregates)
6. Stage (load into staging tables, if used)
7. Audit reports (for example, on compliance with business
rules. Also, in case of failure, helps to diagnose/repair)
8. Publish (to target tables)
10. Clean up
Four-layered approach for ETL architecture design Functional
layer: Core functional ETL processing (extract, transform, and
Operational management layer:Job-stream definition and
management, parameters, scheduling, monitoring, communication and
Audit, balance and control (ABC) layer:Job-execution statistics,
balancing and controls, rejects- and error-handling, codes
Utility layer:Common components supporting all other layers.
Use file-based ETL processing where possible Storage costs
Intermediate files serve multiple purposes:
Used for testing and debugging
Used for restart and recover processing
Used to calculate control statistics
Helps to reduce dependencies - enables modular programming.
Allows flexibility for job-execution and -scheduling
Better performance if coded properly, and can take advantage of
parallel processing capabilities when the need arises.
Use data-driven methods and minimize custom ETL coding
Parameter-driven jobs, functions, and job-control
Code definitions and mapping in database
Consideration for data-driven tables to support more complex
code-mappings and business-rule application.
Qualities of a good ETL architecture design : Performance
Recoverable (run_id, ...)
Operable (completion-codes for phases, re-running from
Auditable (in two dimensions: business requirements and
What is InformaticaInformatica Power Center is a powerful ETL
tool from Informatica Corporation.
Informatica Corporation products are: Informatica Power
Informatica on demand
Informatica B2B Data Exchange
Informatica Data Quality
Informatica Data Explorer
Informatica Power Center is a single, unified enterprise data
integration platform for accessing, discovering, and integrating
data from virtually any business system, in any format, and
delivering that data throughout the enterprise at any speed.
Informatica Power Center Editions :Because every data
integration project is different and includes many variables such
as data volumes, latency requirements, IT infrastructure, and
methodologiesInformatica offers three Power Center Editions and a
suite of Power Center Options to meet your projects and
organizations specific needs.
Real Time Edition
Informatica Power Center Standard Edition:Power Center Standard
Edition is a single, unified enterprise data integration platform
for discovering, accessing, and integrating data from virtually any
business system, in any format, and delivering that data throughout
the enterprise to improve operational efficiency.
Key features include:
A high-performance data integration server
A global metadata infrastructure
Visual tools for development and centralized administration
Productivity tools to facilitate collaboration among architects,
analysts, and developers .
Informatica Power Center Real Time Edition :Packaged for
simplicity and flexibility, Power Center Real Time Edition extends
Power Center Standard Edition with additional capabilities for
integrating and provisioning transactional or operational data in
real-time. Power Center Real Time Edition provides the ideal
platform for developing sophisticated data services and delivering
timely information as a service, to support all business needs. It
provides the perfect real-time data integration complement to
service-oriented architectures, application integration approaches,
such as enterprise application integration (EAI), enterprise
service buses (ESB), and business process management (BPM).
Key features include: Change data capture for relational data
Integration with messaging systems
Built-in support for Web services
Dynamic partitioning with data smart parallelism
Process orchestration and human workflow capabilities
Informatica Power Center Real Time Edition :Power Center
Advanced Edition addresses requirements for organizations that are
Standardizing data integration at an enterprise level, across a
number of projects and departments. It combines all the
capabilities of Power Center Standard Edition and features
additional capabilities that are ideal for data governance and
Integration Competency Centers.
Key features include: Dynamic partitioning with data smart
Powerful metadata analysis capabilities
Web-based data profiling and reporting capabilities
Power Center includes the following components:
Power Center domain
Power Center repository
Power Center Client
Web Services Hub
SAP BW Service
Power Center Repository Reports
POWERCENTER CLIENTThe Power Center Client consists of the
following applications that we use to manage the repository, design
mappings, mapplets, and create sessions to load the data:
2. Data Stencil
3. Repository Manager
4. Workflow Manager
5. Workflow Monitor
1. Designer:Use the Designer to create mappings that contain
transformation instructions for the Integration Service.
The Designer has the following tools that you use to analyze
sources, design target Schemas, and build source-to-target
Source Analyzer: Import or create source definitions.
Target Designer:Import or create target definitions.
Transformation Developer:Develop transformations to use in
You can also develop user-defined functions to use in
Mapplet Designer: Create sets of transformations to use in
Mapping Designer:Create mappings that the Integration Service
uses to Extract, transform, and load data.
2.Data StencilUse the Data Stencil to create mapping template
that can be used to generate multiple mappings. Data Stencil uses
the Microsoft Office Visio interface to create mapping templates.
Not used by a developer usually.
3.Repository ManagerUse the Repository Manager to administer
repositories. You can navigate through multiple folders and
repositories, and complete the following tasks:
Manage users and groups:Create, edit, and delete repository
users and User groups. We can assign and revoke repository
privileges and folder Permissions.
Perform folder functions: Create, edit, copy, and delete
folders. Work we perform in the Designer and Workflow Manager is
stored in folders. If we want to share metadata, you can configure
a folder to be shared.
View metadata:Analyze sources, targets, mappings, and shortcut
dependencies, search by keyword, and view the properties of
repository Objects. We create repository objects using the Designer
and Workflow Manager Client tools.
We can view the following objects in the Navigator window of the
Source definitions:Definitions of database objects (tables,
views, synonyms) or Files that provide source data.
Target definitions:Definitions of database objects or files that
contain the target data.
Mappings:A set of source and target definitions along with
transformations containing business logic that you build into the
transformation. These are the instructions that the Integration
Service uses to transform and move data.
Reusable transformations:Transformations that we use in multiple
Mapplets:A set of transformations that you use in multiple
Sessions and workflows:Sessions and workflows store information
about how and When the Integration Service moves data. A workflow
is a set of instructions that Describes how and when to run tasks
related to extracting, transforming, and loading Data. A session is
a type of task that you can put in a workflow. Each session
Corresponds to a single mapping.
4.Workflow Manager :Use the Workflow Manager to create,
schedule, and run workflows. A workflow is a set of instructions
that describes how and when to run tasks related to extracting,
transforming, and loading data.
The Workflow Manager has the following tools to help us develop
Task Developer:Create tasks we want to accomplish in the
Work let Designer: Create a worklet in the Worklet Designer. A
worklet is an object that groups a set of tasks. A worklet is
similar to a workflow, but without scheduling information. We can
nest worklets inside a workflow.
Workflow Designer:Create a workflow by connecting tasks with
links in the Workflow Designer. You can also create tasks in the
Workflow Designer as you develop the workflow.
When we create a workflow in the Workflow Designer, we add tasks
to the workflow. The Workflow Manager includes tasks, such as the
Session task, the Command task, and the Email task so you can
design a workflow. The Session task is based on a mapping we build
in the Designer.
We then connect tasks with links to specify the order of
execution for the tasks we created. Use conditional links and
workflow variables to create branches in the workflow.
5.Workflow MonitorUse the Workflow Monitor to monitor scheduled
and running workflows for each Integration Service. We can view
details about a workflow or task in Gantt chart view or Task view.
We Can run, stop, abort, and resume workflows from the Workflow
Monitor. We can view Sessions and workflow log events in the
Workflow Monitor Log Viewer.
The Workflow Monitor displays workflows that have run at least
once. The Workflow Monitor continuously receives information from
the Integration Service and Repository Service. It also fetches
information from the repository to display historic
Services Behind SceneINTEGRATION SERVICE PROCESSThe Integration
Service starts an Integration Service process to run and monitor
workflows. The Integration Service process accepts requests from
the Power Center Client and from pmcmd. It performs the following
Manages workflow scheduling.
Locks and reads the workflow.
Reads the parameter file.
Creates the workflow log.
Runs workflow tasks and evaluates the conditional links
Starts the DTM process or processes to run the session.
Writes historical run information to the repository.
Sends post-session email in the event of a DTM failure.
LOAD BALANCERThe Load Balancer is a component of the Integration
Service that dispatches tasks to achieve optimal performance and
scalability. When we run a workflow, the Load Balancer dispatches
the Session, Command, and predefined Event-Wait tasks within the
The Load Balancer dispatches tasks in the order it receives
them. When the Load Balancer needs to dispatch more Session and
Command tasks than the Integration Service can run, it places the
tasks it cannot run in a queue. When nodes become available, the
Load Balancer dispatches tasks from the queue in the order
determined by the workflow service level.
DTM PROCESSWhen the workflow reaches a session, the Integration
Service process starts the DTM process. The DTM is the process
associated with the session task. The DTM process performs the
Retrieves and validates session information from the
Performs pushdown optimization when the session is configured
for pushdown optimization.
Adds partitions to the session when the session is configured
for dynamic partitioning.
Expands the service process variables, session parameters, and
mapping variables and parameters.
Creates the session log.
Validates source and target code pages.
Verifies connection object permissions.
Runs pre-session shell commands, stored procedures, and SQL.
Sends a request to start worker DTM processes on other nodes
when the session is configured to run on a grid.
Creates and run mapping, reader, writer, and transformation
threads to extract, transform, and load data.
Runs post-session stored procedures, SQL, and shell
Sends post-session email.
PROCESSING THREADSThe DTM allocates process memory for the
session and divides it into buffers. This is also known as buffer
memory. The default memory allocation is 12,000,000 bytes.
The DTM uses multiple threads to process data in a session. The
main DTM thread is called the master thread.
The master thread can create the following types of threads:
Mapping Threads:One mapping thread for each session.
Pre- and Post-Session Threads:One thread created.
Reader Threads:One thread for each partition
Transformation Threads:One thread for each partition
Writer Threads:One thread for each partition
CODE PAGES and DATA MOVEMENTA code page contains the encoding to
specify characters in a set of one or more languages. An encoding
is the assignment of a number to a character in the character
The Integration Service can move data in either ASCII or Unicode
data movement mode. These modes determine how the Integration
Service handles character data.
We choose the data movement mode in the Integration Service
configuration settings. If we want to move multi byte data, choose
Unicode data movement mode.
ASCII Data Movement Mode:In ASCII mode, the Integration Service
recognizes 7-bit ASCII and EBCDIC characters and stores each
character in a single byte.
Unicode Data Movement Mode:Use Unicode data movement mode when
sources or targets use 8-bit or multi byte character sets and
contain character data.
Try U R Hand's on Admin-ConsoleRepository Manager Tasks: Add
domain connection information
Add and connect to a repository
Work with Power Center domain and repository connections
Search for repository objects or keywords
View object dependencies
Compare repository objects
Truncate session and workflow log entries
View user connections
Exchange metadata with other business intelligence tools
Add a repository to the Navigator, and then configure the domain
connection information when we connect to the repository.
1.Adding a Repository to the Navigator :1. In any of the Power
Center Client tools, click Repository > Add.
2. Enter the name of the repository and a valid repository user
3. Click OK.
Before we can connect to the repository for the first time, we
must configure the Connection information for the domain that the
repository belongs to.
2.Configuring a Domain Connection1. In a Power Center Client
tool, select the Repositories node in the Navigator.
2. Click Repository > Configure Domains to open the Configure
Domains dialog box.
3. Click the Add button. The Add Domain dialog box appears.
4. Enter the domain name, gateway host name, and gateway port
5. Click OK to add the domain connection.
3.Connecting to a Repository1. Launch a Power Center Client
2. Select the repository in the Navigator and click Repository
> Connect, or double-click the repository.
3. Enter a valid repository user name and password.
4. Click Connect.
Click on more button to add, change or view domain
4.Viewing Object DependenciesBefore we change or delete
repository objects, we can view dependencies to see the impact on
other objects. For example, before you remove a session, we can
find out which workflows use the session. We can view dependencies
for repository objects in the Repository Manager, Workflow Manager,
and Designer tools.
Steps:1. Connect to the repository.
2. Select the object of use in navigator.
3. Click Analyze and Select the dependency we want to view.
5.Validating Multiple ObjectsWe can validate multiple objects in
the repository without fetching them into the workspace. We can
save and optionally check in objects that change from invalid to
valid status as a result of the validation. We can validate
sessions, mappings, mapplets, workflows, and worklets.
Steps:1. Select the objects you want to validate.
2. Click Analyze and Select Validate
3. Select validation options from the Validate Objects dialog
4. Click Validate.
5. Click a link to view the objects in the results group.
6.Comparing Repository ObjectsWe can compare two repository
objects of the same type to identify differences between the
objects. For example, we can compare two sessions to check for
differences. When we compare two objects, the Repository Manager
displays their attributes.
Steps:1. In the Repository Manager, connect to the
2. In the Navigator, select the object you want to compare.
3. Click Edit > Compare Objects.
4. Click Compare in the dialog box displayed.7.Truncating
Workflow and Session Log EntriesWhen we configure a session or
workflow to archive session logs or workflow logs, the Integration
Service saves those logs in local directories. The repository also
creates an entry for each saved workflow log and session log. If we
move or delete a session log or workflow log from the workflow log
directory or session log directory, we can remove the entries from
Steps:1. In the Repository Manager, select the workflow in the
Navigator window or in the Main window.
2. Choose Edit > Truncate Log. The Truncate Workflow Log
dialog box appears.
3. Choose to delete all workflow and session log entries or to
delete all workflow and session log entries with an end time before
a particular date.
4. If you want to delete all entries older than a certain date,
enter the date and time.
5. Click OK.
8.Managing User Connections and LocksIn the Repository Manager,
we can view and manage the following items:
Repository object locks: The repository locks repository objects
and folders by user. The repository creates different types of
locks depending on the task. The Repository Service locks and
unlocks all objects in the repository.
User connections:Use the Repository Manager to monitor user
connections to the repository. We can end connections when
Types of locks created:1. In-use lock:Placed on objects we want
2. Write-intent lock:Placed on objects we want to modify.
3. Execute lock:Locks objects we want to run, such as workflows
Steps:1. Launch the Repository Manager and connect to the
2. Click Edit > Show User Connections or Show locks
3. The locks or user connections will be displayed in a
4. We can do the rest as per our need.
9.Managing Users and Groups1. In the Repository Manager, connect
to a repository.
2. Click Security > Manage Users and Privileges.
3. Click the Groups tab to create Groups. or
4. Click the Users tab to create Users
5. Click the Privileges tab to give permissions to groups and
6. Select the options available to add, edit, and remove users
There are two default repository user groups:
Administrators:This group initially contains two users that are
created by default.
The default users are Administrator and the database user that
created the repository. We cannot delete these users from the
repository or remove them from the Administrators group.
Public:The Repository Manager does not create any default users
in the Public group.
10 Working with FoldersWe can create, Edit or delete folder as
per our need.
1. In the Repository Manager, connect to a repository.
2. Click Folder > Create.
Enter the following information:
3. Click ok.
Difference Between 7.1 and 8.61.Target from Transformation:In
Informatica 8X we can create target from Transformation by dragging
transformation in Target designer
2.Pushdown optimization: Uses increased performance by pushing
Transformation logic to the database by analyzing the
transformations and Issuing SQL statements to sources and targets.
Only processes any Transformation logic that it cannot push to the
3.New function in expression editor: New function have been
introduced in Informatica 8X like reg_extract and reg_match
4.Repository queryavailable in both versioned and non versioned
Repositories previously it was available only for versioned
5.UDF(User defined function) similar to macro in excel
6.FTP: We can have partitioned FTP targets and Indirect FTP File
source (with file list).
7.Propagating Port Descriptions:In Informatica 8 we can edit a
port description and propagate the description to other
transformations in the mapping.
8.Environment SQL Enhancements:Environment SQL can still be used
to Execute an SQL statement at start of connection to the database.
We can Use SQL commands that depend upon a transaction being opened
during The entire read or write process. For example, the following
SQL command Modifies how the session handles characters: Alter
9.Concurrently write to multiple filesin a session with
10.Flat File Enhancements: Reduced conversion of data types
Delimited file performance has improved
Flat file now can have integer and double data types
Data can be appended to existing flat files
Informatica power center 8 is having the following features
which makes it more powerful, easy to use and manage when compared
to previous versions.
Supports Service oriented architecture
Access to structured, unstructured and semi structured data
Support for grid computing
Metadata exchange enhancements
Team based Development
Global Web-based Admin console
23 New functions
User defined functions
Custom transformation enhancements
Flat file enhancements
New Data Federation option
Unit TestingUnit testing can be broadly classified into 2
Quantitative TestingValidate your Source and Targeta) Ensure
that your connectors are configured properly.
b) If you are using flat file make sure have enough read/write
permission on the file share.
c) You need to document all the connector information.
Analyze the Load Timea) Execute the session and review the
b) Check the Read and Write counters. How long it takes to
perform the load.
c) Use the session and workflow logs to capture the load
d) You need to document all the load timing information.
Analyze the success rows and rejections.a) Have customized SQL
queries to check the source/targets and here we will perform the
Record Count Verification.
b) Analyze the rejections and build a process to handle those
rejections. This requires a clear business requirement from the
business on how to handle the data rejections. Do we need to reload
or reject and inform etc? Discussions are required and appropriate
process must be developed.
Performance Improvementa) Network Performance
b) Session Performance
c) Database Performance
d) Analyze and if required define the Informatica and DB
Qualitative TestingAnalyze & validate your transformation
business rules. More of functional testing.
e) You need review field by field from source to target and
ensure that the required transformation logic is applied.
f) If you are making changes to existing mappings make use of
thedata lineagefeature Available with Informatica Power Center.
This will help you to find the consequences of Altering or deleting
a port from existing mapping.
g) Ensure that appropriate dimension lookups have been used and
your development is in Sync with your business requirements.
Integration TestingAfter unit testing is complete; it should
form the basis of starting integration testing. Integration testing
Test out initial and incremental loading of the data
Integration testing will involve following1. Sequence of ETL
jobs in batch.
2. Initial loading of records on data warehouse.
3. Incremental loading of records at a later date to verify the
newly inserted or updated data.
4. Testing the rejected records that dont fulfill transformation
5. Error log generation.
Integration Testing would cover End-to-End Testing for DWH. The
coverage of the tests would include the below:
Count ValidationRecord Count Verification: DWH backend/Reporting
queries against source and target as an initial check.
Control totals:To ensure accuracy in data entry and processing,
control totals can be compared by the system with manually entered
or otherwise calculated control totals using the data fields such
as quantities, line items, documents, or dollars, or simple record
Hash totals:This is a technique for improving data accuracy,
whereby totals are obtained on identifier fields (i.e., fields for
which it would logically be meaningless to construct a total), such
as account number, social security number, part number, or employee
number. These totals have no significance other than for internal
system control purposes.
Limit checks:The program tests specified data fields against
defined high or low value limits (e.g., quantities or dollars) for
acceptability before further processing.
Dimensional AnalysisData integrity between the various source
tables and relationships.
Statistical AnalysisValidation for various calculations.
When you validate the calculations you dont require loading the
entire rows into target and Validating it.
Instead you use theEnable Test Loadfeature available in
Informatica Power Center.PropertyDescription
Enable Test LoadYou can configure the Integration Service to
perform a test load.
With a test load, the Integration Service reads and transforms
data without writing to targets. The Integration Service generates
all session files, and performs all pre- and post-session
Functions, as if running the full session.
The Integration Service writes data to relational targets, but
rolls back the data when the session completes. For all other
target types, such as flat file and SAP BW, the Integration Service
does not write data to the targets.
Enter the number of source rows you want to test in the Number
of Rows to Test field. You cannot perform a test load on sessions
using XML sources. You can perform a test load for relational
targets when you configure a session for normal Mode. If you
configure the session for bulk mode, the session fails.
Number of Rows to TestEnter the number of source rows you want
the Integration Service to test load. The Integration Service reads
the number you configure for the test load.
Data Quality ValidationCheck for missing data, negatives and
consistency. Field-by-Field data verification can be done to check
the consistency of source and target data.
Overflow checks: This is a limit check based on the capacity of
a data field or data file area to accept data. This programming
technique can be used to detect the truncation of a financial or
quantity data field value after computation (e.g., addition,
multiplication, and division). Usually, the first digit is the one
Format checks: These are used to determine that data are entered
in the proper mode, as numeric or alphabetical characters, within
designated fields of information. The proper mode in each case
depends on the data field definition.
Sign test: This is a test for a numeric data field containing a
designation of an algebraic sign, + or - , which can be used to
denote, for example, debits or credits for financial data
Size test: This test can be used to test the full size of the
data field. For example, a social security number in the United
States should have nine digits
GranularityValidate at the lowest granular level possible
Other validationsAudit Trails, Transaction Logs, Error Logs and
Note: Based on your project and business needs you might have
additional testing requirements.
User Acceptance TestIn this phase you will involve the user to
test the end results and ensure that business is satisfied with the
quality of the data.
Any changes to the business requirement will follow the change
management process and eventually those changes have to follow the
Optimize Development, Testing, and Training Systems Dramatically
accelerate development and test cycles and reduce storage costs by
creating fully functional, smaller targeted data subsets for
development, testing, and training systems, while maintaining full
Quickly build and update nonproduction systems with a small
subset of production data and replicate current subsets of
nonproduction copies faster.
Simplify test data management and shrink the footprint of
nonproduction systems to significantly reduce IT infrastructure and
Reduce application and upgrade deployment risks by properly
testing configuration updates with up-to-date, realistic data
before introducing them into production .
Easily customize provisioning rules to meet each organizations
changing business requirements.
Lower training costs by standardizing on one approach and one
Train employees effectively using reliable, production-like data
in training systems.
Support Corporate Divestitures and Reorganizations Untangle
complex operational systems and separate data along business lines
to quickly build the divested organizations system.
Accelerate the provisioning of new systems by using only data
thats relevant to the divested organization.
Decrease the cost and time of data divestiture with no
reimplementation costs .
Reduce the Total Cost of Storage Ownership Dramatically increase
an IT teams productivity by reusing a comprehensive list of data
objects for data selection and updating processes across multiple
projects, instead of coding by handwhich is expensive, resource
intensive, and time consuming .
Accelerate application delivery by decreasing R&D cycle time
and streamlining test data management.
Improve the reliability of application delivery by ensuring IT
teams have ready access to updated quality production data.
Lower administration costs by centrally managing data growth
solutions across all packaged and custom applications.
Substantially accelerate time to value for subsets of packaged
Decrease maintenance costs by eliminating custom code and
Informatica Power Center TestingDebugger: Very useful tool for
debugging a valid mapping to gain troubleshooting information about
data and error conditions. Refer Informatica documentation to know
more about debugger tool.
Test Load Options Relational Targets.
Running the Integration Service in Safe Mode
Test a development environment.Run the Integration Service in
safe mode to test a development environment before migrating to
Troubleshoot the Integration Service.Configure the Integration
Service to fail over in safe mode and troubleshoot errors when you
migrate or test a production environment configured for high
availability. After the Integration Service fails over in safe
mode, you can correct the error that caused the Integration Service
to fail over.
Syntax Testing:Test your customized queries using your source
qualifier before executing the session. Performance Testing for
identifying the following bottlenecks:
Use the following methods to identify performance
Run test sessions. You can configure a test session to read from
a flat file source or to write to a flat file target to identify
source and target bottlenecks.
Analyze performance details. Analyze performance details, such
as performance counters, to determine where session performance
Analyze thread statistics. Analyze thread statistics to
determine the optimal number of partition points.
Monitor system performance. You can use system monitoring tools
to view the percentage of CPU use, I/O waits, and paging to
identify system bottlenecks. You can also use the Workflow Monitor
to view system resource usage. Use Power Center conditional filter
in the Source Qualifier to improve performance.
Share metadata.You can share metadata with a third party. For
example, you want to send a mapping to someone else for testing or
analysis, but you do not want to disclose repository connection
information for security reasons. You can export the mapping to an
XML file and edit the repository connection information before
sending the XML file. The third party can import the mapping from
the XML file and analyze the metadata.
DebuggerYou can debug a valid mapping to gain troubleshooting
information about data and error conditions. To debug a mapping,
you configure and run the Debugger from within the Mapping
Designer. The Debugger uses a session to run the mapping on the
Integration Service. When you run the Debugger, it pauses at
breakpoints and you can view and edit transformation output
You might want to run the Debugger in the following
Before you run a session.After you save a mapping, you can run
some initial tests with a debug session before you create and
configure a session in the Workflow Manager.
After you run a session.If a session fails or if you receive
unexpected results in the target, you can run the Debugger against
the session. You might also want to run the Debugger against a
session if you want to debug the mapping using the configured
Debugger Session Types:You can select three different debugger
session types when you configure the Debugger. The Debugger runs a
workflow for each session type. You can choose from the following
Debugger session types when you configure the Debugger:
Use an existing non-reusable session.The Debugger uses existing
source, target, and session configuration properties. When you run
the Debugger, the Integration Service runs the non-reusable session
and the existing workflow. The Debugger does not suspend on
Use an existing reusable session.The Debugger uses existing
source, target, and session configuration properties. When you run
the Debugger, the Integration Service runs a debug instance of the
reusable session And creates and runs a debug workflow for the
Create a debug session instance.You can configure source,
target, and session configuration properties through the Debugger
Wizard. When you run the Debugger, the Integration Service runs a
debug instance of the debug workflow and creates and runs a debug
workflow for the session.
Debug ProcessTo debug a mapping, complete the following
1.Create breakpoints.Create breakpoints in a mapping where you
want the Integration Service to evaluate data and error
2.Configure the Debugger.Use the Debugger Wizard to configure
the Debugger for the mapping. Select the session type the
Integration Service uses when it runs the Debugger. When you create
a debug session, you configure a subset of session properties
within the Debugger Wizard, such as source and target location. You
can also choose to load or discard target data.
3.Run the Debugger.Run the Debugger from within the Mapping
Designer. When you run the Debugger, the Designer connects to the
Integration Service. The Integration Service initializes the
Debugger and runs the debugging session and workflow. The
Integration Service reads the breakpoints and pauses the
when the breakpoints evaluate to true.
4.Monitor the Debugger.While you run the Debugger, you can
monitor the target data, transformation and mapplet output data,
the debug log, and the session log. When you run the Debugger, the
Designer displays the following windows:
Debug log.View messages from the Debugger.
Target window.View target data.
Instance window.View transformation data.
5.Modify data and breakpoints.When the Debugger pauses, you can
modify data and see the effect on transformations, mapplets, and
targets as the data moves through the pipeline. You can also modify
The Designer saves mapping breakpoint and Debugger information
in the workspace files. You can copy breakpoint information and the
Debugger configuration to another mapping. If you want to run the
Debugger from another Power Center Client machine, you can copy the
breakpoint information and the Debugger configuration to the other
Power Center Client machine.
Running the Debugger:When you complete the Debugger Wizard, the
Integration Service starts the session and initializes the
Debugger. After initialization, the Debugger moves in and out of
running and paused states based on breakpoints and commands that
you issue from the Mapping Designer. The Debugger can be in one of
the following states:
Initializing.The Designer connects to the Integration
Running.The Integration Service processes the data.
Paused.The Integration Service encounters a break and pauses the
Note:To enable multiple users to debug the same mapping at the
same time, each user must configure different port numbers in the
Tools > Options > Debug tab.
The Debugger does not use the high availability
Monitoring the Debugger :When you run the Debugger, you can
monitor the following information:
Session status.Monitor the status of the session.
Data movement.Monitor data as it moves through
Breakpoints.Monitor data that meets breakpoint conditions.
Target data.Monitor target data on a row-by-row basis.
The Mapping Designer displays windows and debug indicators that
help you monitor the session:
Debug indicators.Debug indicators on transformations help you
follow breakpoints and data flow.
Instance window.When the Debugger pauses, you can view
transformation data and row information in the Instance window.
Target window.View target data for each target in the
Output window.The Integration Service writes messages to the
following tabs in the Output window:
Debugger tab.The debug log displays in the Debugger tab.
Session Log tab.The session log displays in the Session Log
Notifications tab.Displays messages from the Repository
While you monitor the Debugger, you might want to change the
transformation output data to see the effect on subsequent
transformations or targets in the data flow. You might also want to
edit or add more breakpoint information to monitor the session more
RestrictionsYou cannot change data for the following output
Normalizer transformation.Generated Keys and Generated Column ID
Rank transformation.RANKINDEX port.
Router transformation.All output ports.
Sequence Generator transformation.CURRVAL and NEXTVAL ports.
Lookup transformation.NewLookupRow port for a Lookup
transformation configured to use a dynamic cache.
Custom transformation.Ports in output groups other than the
current output group.
Java transformation.Ports in output groups other than the
current output group.
Additionally, you cannot change data associated with the
Mapplets that are not selected for debugging
Input or input/output ports
Output ports when the Debugger pauses on an error breakpoint
Constraint-Based Loading:In the Workflow Manager, you can
specify constraint-based loading for a session. When you select
this option, the Integration Service orders the target load on a
row-by-row basis. For every row generated by an active source, the
Integration Service loads the corresponding transformed row first
to the primary key table, then to any foreign key tables.
Constraint-based loading depends on the following requirements:
Active source.Related target tables must have the same active
Key relationships.Target tables must have key relationships.
Target connection groups.Targets must be in one target
Treat rows as insert.Use this option when you insert into the
target. You cannot use updates with constraint based loading.
Active Source:When target tables receive rows from different
active sources, the Integration Service reverts to normal loading
for those tables, but loads all other targets in the session using
constraint-based loading when possible. For example, a mapping
contains three distinct pipelines. The first two contain a source,
source qualifier, and target. Since these two targets receive data
from different active sources, the Integration Service reverts to
normal loading for both targets. The third pipeline contains a
source, Normalizer, and two targets. Since these two targets share
a single active source (the Normalizer), the Integration Service
performs constraint-based loading: loading the primary key table
first, then the foreign key table.
Key Relationships:When target tables have no key relationships,
the Integration Service does not perform constraint-based
Similarly, when target tables have circular key relationships,
the Integration Service reverts to a normal load. For example, you
have one target containing a primary key and a foreign key related
to the primary key in a second target. The second target also
contains a foreign key that references the primary key in the first
target. The Integration Service cannot enforce constraint-based
loading for these tables. It reverts to a normal load.
Target Connection Groups:The Integration Service enforces
constraint-based loading for targets in the same target connection
group. If you want to specify constraint-based loading for multiple
targets that receive data from the same active source, you must
verify the tables are in the same target connection group. If the
tables with the primary key-foreign key relationship are in
different target connection groups, the Integration Service cannot
enforce constraint-based loading when you run the workflow. To
verify that all targets are in the same target connection group,
complete the following tasks:
Verify all targets are in the same target load order group and
receive data from the same active source.
Use the default partition properties and do not add partitions
or partition points.
Define the same target type for all targets in the session
Define the same database connection name for all targets in the
Choose normal mode for the target load type for all targets in
the session properties.
Treat Rows as Insert:Use constraint-based loading when the
session option Treat Source Rows As is set to insert. You might get
inconsistent data if you select a different Treat Source Rows As
option and you configure the session for constraint-based
When the mapping contains Update Strategy transformations and
you need to load data to a primary key table first, split the
mapping using one of the following options:
Load primary key table in one mapping and dependent tables in
another mapping. Use constraint-based loading to load the primary
Perform inserts in one mapping and updates in another
Constraint-based loading does not affect the target load
ordering of the mapping. Target load ordering defines the order the
Integration Service reads the sources in each target load order
group in the mapping. A target load order group is a collection of
source qualifiers, transformations, and targets linked together in
a mapping. Constraint based loading establishes the order in which
the Integration Service loads individual targets within a set of
targets receiving data from a single source qualifier.
ExampleThe following mapping is configured to perform
In the first pipeline, target T_1 has a primary key, T_2 and T_3
contain foreign keys referencing the T1 primary key. T_3 has a
primary key that T_4 references as a foreign key.
Since these tables receive records from a single active source,
SQ_A, the Integration Service loads rows to the target in the
2. T_2 and T_3 (in no particular order)
The Integration Service loads T_1 first because it has no
foreign key dependencies and contains a primary key referenced by
T_2 and T_3. The Integration Service then loads T_2 and T_3, but
since T_2 and T_3 have no dependencies, they are not loaded in any
particular order. The Integration Service loads T_4 last, because
it has a foreign key that references a primary key in T_3.After
loading the first set of targets, the Integration Service begins
reading source B. If there are no key relationships between T_5 and
T_6, the Integration Service reverts to a normal load for both
If T_6 has a foreign key that references a primary key in T_5,
since T_5 and T_6 receive data from a single active source, the
Aggregator AGGTRANS, the Integration Service loads rows to the
tables in the following order:
T_1, T_2, T_3, and T_4 are in one target connection group if you
use the same database connection for each target, and you use the
default partition properties. T_5 and T_6 are in another target
connection group together if you use the same database connection
for each target and you use the default partition properties. The
Integration Service includes T_5 and T_6 in a different target
connection group because they are in a different target load order
group from the first four targets.
Enabling Constraint-Based Loading:When you enable
constraint-based loading, the Integration Service orders the target
load on a row-by-row basis. To enable constraint-based loading:
1. In the General Options settings of the Properties tab, choose
Insert for the Treat Source Rows As property.
2. Click the Config Object tab. In the Advanced settings, select
Constraint Based Load Ordering.
3. Click OK.
Target Load OrderWhen you use a mapplet in a mapping, the
Mapping Designer lets you set the target load plan for sources
within the mapplet.
Setting the Target Load OrderYou can configure the target load
order for a mapping containing any type of target definition. In
the Designer, you can set the order in which the Integration
Service sends rows to targets in different target load order groups
in a mapping. A target load order group is the collection of source
qualifiers, transformations, and targets linked together in a
mapping. You can set the target load order if you want to maintain
referential integrity when inserting, deleting, or updating tables
that have the primary key and foreign key constraints.
The Integration Service reads sources in a target load order
group concurrently, and it processes target load order groups
To specify the order in which the Integration Service sends data
to targets, create one source qualifier for each target within a
mapping. To set the target load order, you then determine in which
order the Integration Service reads each source in the mapping.
The following figure shows two target load order groups in one
In this mapping, the first target load order group includes
ITEMS, SQ_ITEMS, and T_ITEMS. The second target load order group
includes all other objects in the mapping, including the
TOTAL_ORDERS target. The Integration Service processes the first
target load order group, and then the second target load order
When it processes the second target load order group, it reads
data from both sources at the same time.
To set the target load order:1. Create a mapping that contains
multiple target load order groups.
2. Click Mappings > Target Load Plan.
3. The Target Load Plan dialog box lists all Source Qualifier
transformations in the mapping and the targets that receive data
from each source qualifier.
4. Select a source qualifier from the list.
5. Click the Up and Down buttons to move the source qualifier
within the load order.
6. Repeat steps 3 to 4 for other source qualifiers you want to
reorder. Click OK.
MAPPING PARAMETERS & VARIABLESMapping parameters and
variables represent values in mappings and mapplets.
When we use a mapping parameter or variable in a mapping, first
we declare the mapping parameter or variable for use in each
mapplet or mapping. Then, we define a value for the mapping
parameter or variable before we run the session.
MAPPING PARAMETERS A mapping parameter represents a constant
value that we can define before running a session.
A mapping parameter retains the same value throughout the entire
Example: When we want to extract records of a particular month
during ETL process, we will create a Mapping Parameter of data type
and use it in query to compare it with the timestamp field in SQL
After we create a parameter, it appears in the Expression
We can then use the parameter in any expression in the mapplet
We can also use parameters in a source qualifier filter,
user-defined join, or extract override, and in the Expression
Editor of reusable transformations.
MAPPING VARIABLES Unlike mapping parameters, mapping variables
are values that can change between sessions.
The Integration Service saves the latest value of a mapping
variable to the repository at the end of each successful
We can override a saved value with the parameter file.
We can also clear all saved values for the session in the
We might use a mapping variable to perform an incremental read
of the source. For example, we have a source table containing time
stamped transactions and we want to evaluate the transactions on a
daily basis. Instead of manually entering a session override to
filter source data each time we run the session, we can create a
mapping variable, $$IncludeDateTime. In the source qualifier,
create a filter to read only rows whose transaction date equals
$$IncludeDateTime, such as:
TIMESTAMP = $$IncludeDateTime
In the mapping, use a variable function to set the variable
value to increment one day each time the session runs. If we set
the initial value of $$IncludeDateTime to 8/1/2004, the first time
the Integration Service runs the session, it reads only rows dated
8/1/2004. During the session, the Integration Service sets
$$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to the repository
at the end of the session. The next time it runs the session, it
reads only rows from August 2, 2004.
Used in following transformations: Expression
Initial and Default Value:When we declare a mapping parameter or
variable in a mapping or a mapplet, we can enter an initial value.
When the Integration Service needs an initial value, and we did not
declare an initial value for the parameter or variable, the
Integration Service uses a default value based on the data type of
the parameter or variable.
Data ->Default Value
String ->Empty String
Date time ->1/1/1
Variable Values:Start value and current value of a mapping
Start Value:The start value is the value of the variable at the
start of the session. The Integration Service looks for the start
value in the following order:
1. Value in parameter file
2. Value saved in the repository
3. Initial value
4. Default value
Current Value:The current value is the value of the variable as
the session progresses. When a session starts, the current value of
a variable is the same as the start value. The final current value
for a variable is saved to the repository at the end of a
successful session. When a session fails to complete, the
Integration Service does not update the value of the variable in
Note:If a variable function is not used to calculate the current
value of a mapping variable, the start value of the variable is
saved to the repository.
Variable Data type and Aggregation Type When we declare a
mapping variable in a mapping, we need to configure the Data type
and aggregation type for the variable. The IS uses the aggregate
type of a Mapping variable to determine the final current value of
the mapping variable.
Aggregation types are:
Count:Integer and small integer data types are valid only.
Max:All transformation data types except binary data type are
Min:All transformation data types except binary data type are
Variable FunctionsVariable functions determine how the
Integration Service calculates the current value of a mapping
variable in a pipeline.
SetMaxVariable:Sets the variable to the maximum value of a group
of values. It ignores rows marked for update, delete, or reject.
Aggregation type set to Max.
SetMinVariable:Sets the variable to the minimum value of a group
of values. It ignores rows marked for update, delete, or reject.
Aggregation type set to Min.
SetCountVariable:Increments the variable value by one. It adds
one to the variable value when a row is marked for insertion, and
subtracts one when the row is Marked for deletion. It ignores rows
marked for update or reject. Aggregation type set to Count.
SetVariable:Sets the variable to the configured value. At the
end of a session, it compares the final current value of the
variable to the start value of the variable. Based on the aggregate
type of the variable, it saves a final value to the repository.
Creating Mapping Parameters and Variables1. Open the folder
where we want to create parameter or variable.
2. In the Mapping Designer, click Mappings > Parameters and
Variables. -or- In the Mapplet Designer, click Mapplet >
Parameters and Variables.
3. Click the add button.
4. Enter name. Do not remove $$ from name.
5. Select Type and Data type. Select Aggregation type for
6. Give Initial Value. Click ok.
Example: Use of Mapping of Mapping Parameters and Variables
EMP will be source table.
Create a target table MP_MV_EXAMPLE having columns: EMPNO,
ENAME, DEPTNO, TOTAL_SAL, MAX_VAR, MIN_VAR, COUNT_VAR and
TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that
changes every month)
SET_VAR: We will be added one month to the HIREDATE of every
Create shortcuts as necessary.
Creating Mapping1. Open folder where we want to create the
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex:
4. Drag EMP and target table.
5. Transformation -> Create -> Select Expression for list
-> Create > Done.
6. Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to
7. Create Parameter $$Bonus and Give initial value as 200.
8. Create variable $$var_max of MAX aggregation type and initial
9. Create variable $$var_min of MIN aggregation type and initial
10. Create variable $$var_count of COUNT aggregation type and
initial value 0. COUNT is visible when datatype is INT or
11. Create variable $$var_set of MAX aggregation type.
12. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR,
out_COUNT_VAR and out_SET_VAR.
13. Open expression editor for TOTAL_SAL. Do the same as we did
earlier for SAL+ COMM. To add $$BONUS to it, select variable tab
and select the parameter from mapping parameter. SAL + COMM +
14. Open Expression editor for out_max_var.
15. Select the variable function SETMAXVARIABLE from left side
$$var_max from variable tab and SAL from ports tab as shown
17. Open Expression editor for out_min_var and write the
SETMINVARIABLE($$var_min,SAL). Validate the expression.
18. Open Expression editor for out_count_var and write the
SETCOUNTVARIABLE($$var_count). Validate the expression.
19. Open Expression editor for out_set_var and write the
20. Click OK. Expression Transformation below:
21. Link all ports from expression to target and Validate
Mapping and Save it.
22. See mapping picture on next page.
PARAMETER FILE A parameter file is a list of parameters and
associated values for a workflow, worklet, or session.
Parameter files provide flexibility to change these variables
each time we run a workflow or session.
We can create multiple parameter files and change the file we
use for a session or workflow. We can create a parameter file using
a text editor such as WordPad or Notepad.
Enter the parameter file name and directory in the workflow or
A parameter file contains the following types of parameters and
Workflow variable:References values and records information in a
Worklet variable:References values and records information in a
worklet. Use predefined worklet variables in a parent workflow, but
we cannot use workflow variables from the parent workflow in a
Session parameter:Defines a value that can change from session
to session, such as a database connection or file name.
Mapping parameter and Mapping variable
USING A PARAMETER FILEParameter files contain several sections
preceded by a heading. The heading identifies the Integration
Service, Integration Service process, workflow, worklet, or session
to which we want to assign parameters or variables.
Make session and workflow.
Give connection information for source and target table.
Run workflow and see result.
Sample Parameter File for Our example:In the parameter file,
folder and session names are case sensitive.
Create a text file in notepad with name Para_File.txt
CONFIGURING PARAMTER FILEWe can specify the parameter file name
and directory in the workflow or session properties.
To enter a parameter file in the workflow properties:
1. Open a Workflow in the Workflow Manager.
2. Click Workflows > Edit.
3. Click the Properties tab.
4. Enter the parameter directory and name in the Parameter
5. Click OK.
To enter a parameter file in the session properties:
1. Open a session in the Workflow Manager.
2. Click the Properties tab and open the General Options
3. Enter the parameter directory and name in the Parameter
4. Example: D:\Files\Para_File.txt or
5. Click OK.
MAPPLETS A mapplet is a reusable object that we create in the
It contains a set of transformations and lets us reuse that
transformation logic in multiple mappings.
Created in Mapplet Designer in Designer Tool.
We need to use same set of 5 transformations in say 10 mappings.
So instead of making 5 transformations in every 10 mapping, we
create a mapplet of these 5 transformations. Now we use this
mapplet in all 10 mappings. Example: To create a surrogate key in
target. We create a mapplet using a stored procedure to create
Primary key for target table. We give target table name and key
column name as input to mapplet and get the Surrogate key as
Mapplets help simplify mappings in the following ways:
Include source definitions: Use multiple source definitions and
source qualifiers to provide source data for a mapping.
Accept data from sources in a mapping
Include multiple transformations: As many transformations as we
Pass data to multiple transformations: We can create a mapplet
to feed data to multiple transformations. Each Output
transformation in a mapplet represents one output group in a
Contain unused ports: We do not have to connect all mapplet
input and output ports in a mapping.
Mapplet Input:Mapplet input can originate from a source
definition and/or from an Input transformation in the mapplet. We
can create multiple pipelines in a mapplet.
We use Mapplet Input transformation to give input to
Use of Mapplet Input transformation is optional.
Mapplet Output:The output of a mapplet is not connected to any
We must use Mapplet Output transformation to store mapplet
A mapplet must contain at least one Output transformation with
at least one connected port in the mapplet.
Example1:We will join EMP and DEPT table. Then calculate total
salary. Give the output to mapplet out transformation.
EMP and DEPT will be source tables.
Output will be given to transformation Mapplet_Out.
Steps:1. Open folder where we want to create the mapping.
2. Click Tools -> Mapplet Designer.
3. Click Mapplets-> Create-> Give name. Ex:
4. Drag EMP and DEPT table.
5. Use Joiner transformation as described earlier to join
6. Transformation -> Create -> Select Expression for list
-> Create -> Done
7. Pass all ports from joiner to expression and then calculate
total salary as described in expression transformation.
8. Now Transformation -> Create -> Select Mapplet Out from
list > Create -> Give name and then done.
9. Pass all ports from expression to Mapplet output.
10. Mapplet -> Validate
11. Repository -> Save
Use of mapplet in mapping: We can mapplet in mapping by just
dragging the mapplet from mapplet folder on left pane as we drag
source and target tables.
When we use the mapplet in a mapping, the mapplet object
displays only the ports from the Input and Output transformations.
These are referred to as the mapplet input and mapplet output
Make sure to give correct connection information in session.
Making a mapping:We will use mplt_example1, and then create a
transformation to filter records whose Total Salary is >=
mplt_example1 will be source.
Create target table same as Mapplet_out transformation as in
picture above. Creating Mapping
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex:
4. Drag mplt_Example1 and target table.
5. Transformation -> Create -> Select Filter for list
-> Create -> Done.
6. Drag all ports from mplt_example1 to filter and give filter
7. Connect all ports from filter to target. We can add more
transformations after filter if needed.
8. Validate mapping and Save it.
Make session and workflow.
Give connection information for mapplet source tables.
Give connection information for target table.
Run workflow and see result.
PARTITIONING A pipeline consists of a source qualifier and all
the transformations and Targets that receive data from that source
When the Integration Service runs the session, it can achieve
higher Performance by partitioning the pipeline and performing the
extract, Transformation, and load for each partition in
A partition is a pipeline stage that executes in a single
reader, transformation, or Writer thread. The number of partitions
in any pipeline stage equals the number of Threads in the stage. By
default, the Integration Service creates one partition in every
PARTITIONING ATTRIBUTES1. Partition points By default, IS sets
partition points at various transformations in the pipeline.
Partition points mark thread boundaries and divide the pipeline
A stage is a section of a pipeline between any two partition
2. Number of Partitions we can define up to 64 partitions at any
partition point in a pipeline.
When we increase or decrease the number of partitions at any
partition point, the Workflow Manager increases or decreases the
number of partitions at all Partition points in the pipeline.
increasing the number of partitions or partition points
increases the number of threads.
The number of partitions we create equals the number of
connections to the source or target. For one partition, one
database connection will be used.
3. Partition types The Integration Service creates a default
partition type at each partition point.
If we have the Partitioning option, we can change the partition
type. This option is purchased separately.
The partition type controls how the Integration Service
distributes data among partitions at partition points.
PARTITIONING TYPES1. Round Robin Partition Type In round-robin
partitioning, the Integration Service distributes rows of data
evenly to all partitions.
Each partition processes approximately the same number of
Use round-robin partitioning when we need to distribute rows
evenly and do not need to group data among partitions.
2. Pass-Through Partition Type In pass-through partitioning, the
Integration Service processes data without Redistributing rows
All rows in a single partition stay in that partition after
crossing a pass-Through partition point.
Use pass-through partitioning when we want to increase data
throughput, but we do not want to increase the number of
3. Database Partitioning Partition Type Use database
partitioning for Oracle and IBM DB2 sources and IBM DB2 targets
Use any number of pipeline partitions and any number of database
We can improve performance when the number of pipeline
partitions equals the number of database partitions.
Database Partitioning with One SourceWhen we use database
partitioning with a source qualifier with one source, the
Integration Service generates SQL queries for each database
partition and distributes the data from the database partitions
among the session partitions Equally.
For example, when a session has three partitions and the
database has five partitions, 1stand 2ndsession partitions will
receive data from 2 database partitions each. Thus four DB
partitions used. 3rdSession partition will receive Data from the
remaining 1 DB partition.
Partitioning a Source Qualifier with Multiple Sources TablesThe
Integration Service creates SQL queries for database partitions
based on the Number of partitions in the database table with the
If the session has three partitions and the database table has
two partitions, one of the session partitions receives no data.
4. Hash Auto-Keys Partition Type The Integration Service uses
all grouped or sorted ports as a compound Partition key.
Use hash auto-keys partitioning at or before Rank, Sorter,
Joiner, and Unsorted Aggregator transformations to ensure that rows
are grouped Properly before they enter these transformations.
5. Hash User-Keys Partition Type The Integration Service uses a
hash function to group rows of data among Partitions.
we define the number of ports to generate the partition key.
we choose the ports that define the partition key .
6. Key range Partition Type We specify one or more ports to form
a compound partition key.
The Integration Service passes data to each partition depending
on the Ranges we specify for each port.
Use key range partitioning where the sources or targets in the
pipeline are Partitioned by key range.
Example:Customer 1-100 in one partition, 101-200 in another and
so on. We Define the range for each partition.
WORKING WITH LINKS Use links to connect each workflow task.
We can specify conditions with links to create branches in the
The Workflow Manager does not allow us to use links to create
loops in the workflow. Each link in the workflow can run only
Valid Workflow :
Example of loop:
Specifying Link Conditions: Once we create links between tasks,
we can specify conditions for each link to determine the order of
execution in the workflow.
If we do not specify conditions for each link, the Integration
Service runs the next task in the workflow by default.
Use predefined or user-defined workflow variables in the link
Steps:1. In the Workflow Designer workspace, double-click the
link you want to specify.
2. The Expression Editor appears.
3. In the Expression Editor, enter the link condition. The
Expression Editor provides predefined workflow variables,
user-defined workflow variables, variable functions, and Boolean
and arithmetic operators.
4. Validate the expression using the Validate button.
Using the Expression Editor:The Workflow Manager provides an
Expression Editor for any expressions in the workflow. We can enter
expressions using the Expression Editor for the following:
SCHEDULERSWe can schedule a workflow to run continuously, repeat
at a given time or interval, or we can manually start a workflow.
The Integration Service runs a scheduled workflow as
By default, the workflow runs on demand. We can change the
schedule settings by editing the scheduler. If we change schedule
settings, the Integration Service reschedules the workflow
according to the new settings.
A scheduler is a repository object that contains a set of
Scheduler can be non-reusable or reusable.
The Workflow Manager marks a workflow invalid if we delete the
scheduler associated with the workflow.
If we choose a different Integration Service for the workflow or
restart the Integration Service, it reschedules all workflows.
If we delete a folder, the Integration Service removes workflows
from the schedule.
The Integration Service does not run the workflow if:
The prior workflow run fails.
We remove the workflow from the schedule
The Integration Service is running in safe mode
Creating a Reusable Scheduler For each folder, the Workflow
Manager lets us create reusable schedulers so we can reuse the same
set of scheduling settings for workflows in the folder.
Use a reusable scheduler so we do not need to configure the same
set of scheduling settings in each workflow.
When we delete a reusable scheduler, all workflows that use the
deleted scheduler becomes invalid. To make the workflows valid, we
must edit them and replace the missing scheduler.
Steps:1. Open the folder where we want to create the
2. In the Workflow Designer, click Workflows >
3. Click Add to add a new scheduler.
4. In the General tab, enter a name for the scheduler.
5. Configure the scheduler settings in the Scheduler tab.
6. Click Apply and OK.
Configuring Scheduler SettingsConfigure the Schedule tab of the
scheduler to set run options, schedule options, start options, and
end options for the schedule.
There are 3 run options:
1. Run on Demand
2. Run Continuously
3. Run on Server initialization
1. Run on Demand:Integration Service runs the workflow when we
start the workflow manually.
2. Run Continuously:Integration Service runs the workflow as
soon as the service initializes. The Integration Service then
starts the next run of the workflow as soon as it finishes the
3. Run on Server initializationIntegration Service runs the
workflow as soon as the service is initialized. The Integration
Service then starts the next run of the workflow according to
settings in Schedule Options.
Schedule options for Run on Server initialization:
Run Once:To run the workflow just once.
Run every:Run the workflow at regular intervals, as
Customized Repeat:Integration Service runs the workflow on the
dates and times specified in the Repeat dialog box.
Start options for Run on Server initialization:
End options for Run on Server initialization:
End on:IS stops scheduling the workflow in the selected
End After:IS stops scheduling the workflow after the set number
Forever:IS schedules the workflow as long as the workflow does
Creating a Non-Reusable Scheduler1. In the Workflow Designer,
open the workflow.
2. Click Workflows > Edit.
3. In the Scheduler tab, choose Non-reusable. Select Reusable if
we want to select an existing reusable scheduler for the
4. Note: If we do not have a reusable scheduler in the folder,
5. create one before we choose Reusable.
6. Click the right side of the Scheduler field to edit
scheduling settings for the non- reusable scheduler
7. If we select Reusable, choose a reusable scheduler from the
8. Browser dialog box.
9. Click Ok.
Points to Ponder : To remove a workflow from its schedule,
right-click the workflow in the Navigator window and choose
To reschedule a workflow on its original schedule, right-click
the workflow in the Navigator window and choose Schedule
WORKING WITH TASKS Part 1The Workflow Manager contains many
types of tasks to help you build workflows and worklets. We can
create reusable tasks in the Task Developer.
Types of tasks:Task TypeTool where task can be createdReusable
SESSION TASK A session is a set of instructions that tells the
Power Center Server how and when to move data from sources to
To run a session, we must first create a workflow to contain the
We can run as many sessions in a workflow as we need. We can run
the Session tasks sequentially or concurrently, depending on our
The Power Center Server creates several files and in-memory
caches depending on the transformations and options used in the
EMAIL TASK The Workflow Manager provides an Email task that
allows us to send email during a workflow.
Created by Administrator usually and we just drag and use it in
Steps:1. In the Task Developer or Workflow Designer, choose
2. Select an Email task and enter a name for the task. Click
3. Click Done.
4. Double-click the Email task in the workspace. The Edit Tasks
dialog box appears.
5. Click the Properties tab.
6. Enter the fully qualified email address of the mail recipient
in the Email User Name field.
7. Enter the subject of the email in the Email Subject field.
Or, you can leave this field blank.
8. Click the Open button in the Email Text field to open the
9. Click OK twice to save your changes.
Example:To send an email when a session completes:
Steps:1. Create a workflow wf_sample_email
2. Drag any session task to workspace.
3. Edit Session task and go to Components tab.
4. See On Success Email Option there and configure it.
5. In Type select reusable or Non-reusable.
6. In Value, select the email task to be used.
7. Click Apply -> Ok.
8. Validate workflow and Repository -> Save
We can also drag the email task and use as per need.
We can set the option to send email on success or failure in
components tab of a session task.
COMMAND TASKThe Command task allows us to specify one or more
shell commands in UNIX or DOS commands in Windows to run during the
For example, we can specify shell commands in the Command task
to delete reject files, copy a file, or archive target files.
Ways of using command task:
1.Standalone Command task:We can use a Command task anywhere in
the workflow or worklet to run shell commands.
2. Pre- and post-session shell command:We can call a Command
task as the pre- or post-session shell command for a Session task.
This is done in COMPONENTS TAB of a session. We can run it in
Pre-Session Command or Post Session Success Command or Post Session
Failure Command. Select the Value and Type option as we did in
Example:to copy a file sample.txt from D drive to E.
Command:COPY D:\sample.txt E:\ in windows
Steps for creating command task:1. In the Task Developer or
Workflow Designer, choose Tasks-Create.
2. Select Command Task for the task type.
3. Enter a name for the Command task. Click Create. Then click
4. Double-click the Command task. Go to commands tab.
5. In the Commands tab, click the Add button to add a
6. In the Name field, enter a name for the new command.
7. In the Command field, click the Edit button to open the
8. Enter only one command in the Command Editor.
9. Click OK to close the Command Editor.
10. Repeat steps 5-9 to add more commands in the task.
11. Click OK.
Steps to create the workflow using command task:1. Create a task
using the above steps to copy a file in Task Developer.
2. Open Workflow Designer. Workflow -> Create -> Give name
and click ok.
3. Start is displayed. Drag session say s_m_Filter_example and
4. Link Start to Session task and Session to Command Task.
5. Double click link between Session and Command and give
condition in editor as
7. Workflow-> Validate
8. Repository > Save
WORKING WITH EVENT TASKSWe can define events in the workflow to
specify the sequence of task execution.
Types of Events: Pre-defined event:A pre-defined event is a
file-watch event. This event Waits for a specified file to arrive
at a given location.
User-defined event:A user-defined event is a sequence of tasks
in the Workflow. We create events and then raise them as per
Steps for creating User Defined Event:1. Open any workflow where
we want to create an event.
2. Click Workflow-> Edit -> Events tab.
3. Click to Add button to add events and give the names as per
4. Click Apply -> Ok. Validate the workflow and Save it.
Types of Events Tasks: EVENT RAISE:Event-Raise task represents a
user-defined event. We use this task to raise a user defined
EVENT WAIT:Event-Wait task waits for a file watcher event or
user defined event to occur before executing the next session in
Example1:Use an event wait task and make sure that session
s_filter_example runs when abc.txt file is present in D:\FILES
Steps for creating workflow:1. Workflow -> Create -> Give
name wf_event_wait_file_watch -> Click ok.
2. Task -> Create -> Select Event Wait. Give name. Click
create and done.
3. Link Start to Event Wait task.
4. Drag s_filter_example to workspace and link it to event wait
5. Right click on event wait task and click EDIT -> EVENTS
6. Select Pre Defined option there. In the blank space, give
directory and filename to watch. Example: D:\FILES\abc.tct
7. Workflow validate and Repository Save.
Example 2:Raise a user defined event when session
s_m_filter_example succeeds. Capture this event in event wait task
and run session S_M_TOTAL_SAL_EXAMPLE
Steps for creating workflow:1. Workflow -> Create -> Give
name wf_event_wait_event_raise -> Click ok.
2. Workflow -> Edit -> Events Tab and add events EVENT1
3. Drag s_m_filter_example and link it to START task.
4. Click Tasks -> Create -> Select EVENT RAISE from list.
5. ER_Example. Click Create and then done.Link ER_Example to
6. Right click ER_Example -> EDIT -> Properties Tab ->
Open Value for User Defined Event and Select EVENT1 from the list
displayed. Apply -> OK.
7. Click link between ER_Example and s_m_filter_example and give
the condition $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
8. Click Tasks -> Create -> Select EVENT WAIT from list.
Give name EW_WAIT. Click Create and then done.
9. Link EW_WAIT to START task.
10. Right click EW_WAIT -> EDIT-> EVENTS tab.
11. Select User Defined there. Select the Event1 by clicking
Browse Events button.
12. Apply -> OK.
13. Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.
14. Mapping -> Validate
15. Repository -> Save.
16. Run workflow and see.
WORKING WITH TASKS Part 2TIMER TASKThe Timer task allows us to
specify the period of time to wait before the Power Center Server
runs the next task in the workflow. The Timer task has two types of
Absolute time:We specify the exact date and time or we can
choose a user-defined workflow variable to specify the exact time.
The next task in workflow will run as per the date and time
Relative time:We instruct the Power Center Server to wait for a
specified period of time after the Timer task, the parent workflow,
or the top-level workflow starts.
Example:Run session s_m_filter_example relative to 1 min after
the timer task.
Steps for creating workflow:1. Workflow -> Create -> Give
name wf_timer_task_example -> Click ok.
2. Click Tasks -> Create -> Select TIMER from list. Give
name TIMER_Example. Click Create and then done.
3. Link TIMER_Example to START task.
4. Right click TIMER_Example-> EDIT -> TIMER tab.
5. Select Relative Time Option and Give 1 min and Select From
start time of this task Option.
6. Apply -> OK.
7. Drag s_m_filter_example and link it to TIMER_Example.
8. Workflow-> Validate and Repository -> Save.
DECISION TASK The Decision task allows us to enter a condition
that determines the execution of the workflow, similar to a link
The Decision task has a pre-defined variable called
$Decision_task_name.condition that represents the result of the
The Power Center Server evaluates the condition in the Decision
task and sets the pre-defined condition variable to True (1) or
We can specify one decision condition per Decision task.
Example:Command Task should run only if either
S_M_TOTAL_SAL_EXAMPLE succeeds. If any of s_m_filter_example
S_M_TOTAL_SAL_EXAMPLE fails then S_m_sample_mapping_EMP should
Steps for creating workflow:1. Workflow -> Create -> Give
name wf_decision_task_example -> Click ok.
2. Drag s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE to
workspace and link both of them to START task.
3. Click Tasks -> Create -> Select DECISION from list.
Give name DECISION_Example. Click Create and then done. Link
DECISION_Example to both s_m_filter_example and
4. Right click DECISION_Example-> EDIT -> GENERAL tab.
5. Set Treat Input Links As to OR. Default is AND. Apply and
6. Now edit decision task again and go to PROPERTIES Tab. Open
the Expression editor by clicking the VALUE section of Decision
Name attribute and enter the following condition:
$S_M_FILTER_EXAMPLE.Status = SUCCEEDED OR
$S_M_TOTAL_SAL_EXAMPLE.Status = SUCCEEDED
7. Validate the condition -> Click Apply -> OK.
8. Drag command task and S_m_sample_mapping_EMP task to
workspace and link them to DECISION_Example task.
9. Double click link between S_m_sample_mapping_EMP &
DECISION_Example & give the condition:
$DECISION_Example.Condition = 0. Validate & click OK.
10. Double click link between Command task and DECISION_Example
and give the condition: $DECISION_Example.Condition = 1. Validate
and click OK.
11. Workflow Validate and repository Save.
12. Run workflow and see the result.
CONTROL TASK We can use the Control task to stop, abort, or fail
the top-level workflow or the parent workflow based on an input
A parent workflow or worklet is the workflow or worklet that
contains the Control task.
We give the condition to the link connected to Control Task.
Fail MeFails the control task.
Fail ParentMarks the status of the WF or worklet that contains
Control task as failed.
Stop ParentStops the WF or worklet that contains the Control
Abort ParentAborts the WF or worklet that contains the Control
Fail Top-Level WFFails the workflow that is running.
Stop Top-Level WFStops the workflow that is running.
Abort Top-Level WFAborts the workflow that is running.
Example:Drag any 3 sessions and if anyone fails, then Abort the
top level workflow.
Steps for creating workflow:1. Workflow -> Create -> Give
name wf_control_task_example -> Click ok.
2. Drag any 3 sessions to workspace and link all of them to
3. Click Tasks -> Create -> Select CONTROL from list. Give
4. Click Create and then done.
5. Link all sessions to the control task cntr_task.
6. Double click link between cntr_task and any session say
s_m_filter_example and give the condition:
$S_M_FILTER_EXAMPLE.Status = SUCCEEDED.
7. Repeat above step for remaining 2 sessions also.
8. Right click cntr_task-> EDIT -> GENERAL tab. Set Treat
Input Links As to OR. Default is AND.
9. Go to PROPERTIES tab of cntr_task and select the value Fail
10. Workflow for Control Option. Click Apply and OK.
11. Workflow Validate and repository Save.
12. Run workflow and see the result.
ASSIGNMENT TASK The Assignment task allows us to assign a value
to a user-defined workflow variable.
See Workflow variable topic to add user defined variables.
To use an Assignment task in the workflow, first create and add
Assignment task to the workflow. Then configure the Assignment
task to assign values or expressions to user-defined variables.
We cannot assign values to pre-defined workflow.
Steps to create Assignment Task:1. Open any workflow where we
want to use Assignment task.
2. Edit Workflow and add user defined variables.
3. Choose Tasks-Create. Select Assignment Task for the task
4. Enter a name for the Assignment task. Click Create. Then
5. Double-click the Assignment task to open the Edit Task dialog
6. On the Expressions tab, click Add to add an assignment.
7. Click the Open button in the User Defined Variables
8. Select the variable for which you want to assign a value.
9. Click the Edit button in the Expression field to open the
10. Enter the value or expression you want to assign.
11. Repeat steps 7-10 to add more variable assignments as
12. Click OK.
INDIRECT LOADING FOR FLAT FILESSuppose, you have 10 flat files
of same structure. All the flat files ha