Click here to load reader

Complete Reference to Informatica

  • View
    1.346

  • Download
    6

Embed Size (px)

DESCRIPTION

Informatica Material

Text of Complete Reference to Informatica

Introduction

ETL Life CycleThe typical real-life ETL cycle consists of the following execution steps:

1. Cycle initiation

2. Build reference data

3. Extract (from sources)

4. Validate

5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates)

6. Stage (load into staging tables, if used)

7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair)

8. Publish (to target tables)

9. Archive

10. Clean up

Best practices

Four-layered approach for ETL architecture design Functional layer: Core functional ETL processing (extract, transform, and load).

Operational management layer:Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting.

Audit, balance and control (ABC) layer:Job-execution statistics, balancing and controls, rejects- and error-handling, codes management.

Utility layer:Common components supporting all other layers.

Use file-based ETL processing where possible Storage costs relatively little

Intermediate files serve multiple purposes:

Used for testing and debugging

Used for restart and recover processing

Used to calculate control statistics

Helps to reduce dependencies - enables modular programming.

Allows flexibility for job-execution and -scheduling

Better performance if coded properly, and can take advantage of parallel processing capabilities when the need arises.

Use data-driven methods and minimize custom ETL coding Parameter-driven jobs, functions, and job-control

Code definitions and mapping in database

Consideration for data-driven tables to support more complex code-mappings and business-rule application.

Qualities of a good ETL architecture design : Performance

Scalable

Migratable

Recoverable (run_id, ...)

Operable (completion-codes for phases, re-running from checkpoints, etc.)

Auditable (in two dimensions: business requirements and technical troubleshooting)

What is InformaticaInformatica Power Center is a powerful ETL tool from Informatica Corporation.

Informatica Corporation products are: Informatica Power Center

Informatica on demand

Informatica B2B Data Exchange

Informatica Data Quality

Informatica Data Explorer

Informatica Power Center is a single, unified enterprise data integration platform for accessing, discovering, and integrating data from virtually any business system, in any format, and delivering that data throughout the enterprise at any speed.

Informatica Power Center Editions :Because every data integration project is different and includes many variables such as data volumes, latency requirements, IT infrastructure, and methodologiesInformatica offers three Power Center Editions and a suite of Power Center Options to meet your projects and organizations specific needs.

Standard Edition

Real Time Edition

Advanced Edition

Informatica Power Center Standard Edition:Power Center Standard Edition is a single, unified enterprise data integration platform for discovering, accessing, and integrating data from virtually any business system, in any format, and delivering that data throughout the enterprise to improve operational efficiency.

Key features include:

A high-performance data integration server

A global metadata infrastructure

Visual tools for development and centralized administration

Productivity tools to facilitate collaboration among architects, analysts, and developers .

Informatica Power Center Real Time Edition :Packaged for simplicity and flexibility, Power Center Real Time Edition extends Power Center Standard Edition with additional capabilities for integrating and provisioning transactional or operational data in real-time. Power Center Real Time Edition provides the ideal platform for developing sophisticated data services and delivering timely information as a service, to support all business needs. It provides the perfect real-time data integration complement to service-oriented architectures, application integration approaches, such as enterprise application integration (EAI), enterprise service buses (ESB), and business process management (BPM).

Key features include: Change data capture for relational data sources

Integration with messaging systems

Built-in support for Web services

Dynamic partitioning with data smart parallelism

Process orchestration and human workflow capabilities

Informatica Power Center Real Time Edition :Power Center Advanced Edition addresses requirements for organizations that are Standardizing data integration at an enterprise level, across a number of projects and departments. It combines all the capabilities of Power Center Standard Edition and features additional capabilities that are ideal for data governance and Integration Competency Centers.

Key features include: Dynamic partitioning with data smart parallelism

Powerful metadata analysis capabilities

Web-based data profiling and reporting capabilities

Power Center includes the following components:

Power Center domain

Administration Console

Power Center repository

Power Center Client

Repository Service

Integration Service

Web Services Hub

SAP BW Service

Data Analyzer

Metadata Manager

Power Center Repository Reports

POWERCENTER CLIENTThe Power Center Client consists of the following applications that we use to manage the repository, design mappings, mapplets, and create sessions to load the data:

1. Designer

2. Data Stencil

3. Repository Manager

4. Workflow Manager

5. Workflow Monitor

1. Designer:Use the Designer to create mappings that contain transformation instructions for the Integration Service.

The Designer has the following tools that you use to analyze sources, design target Schemas, and build source-to-target mappings:

Source Analyzer: Import or create source definitions.

Target Designer:Import or create target definitions.

Transformation Developer:Develop transformations to use in mappings.

You can also develop user-defined functions to use in expressions.

Mapplet Designer: Create sets of transformations to use in mappings.

Mapping Designer:Create mappings that the Integration Service uses to Extract, transform, and load data.

2.Data StencilUse the Data Stencil to create mapping template that can be used to generate multiple mappings. Data Stencil uses the Microsoft Office Visio interface to create mapping templates. Not used by a developer usually.

3.Repository ManagerUse the Repository Manager to administer repositories. You can navigate through multiple folders and repositories, and complete the following tasks:

Manage users and groups:Create, edit, and delete repository users and User groups. We can assign and revoke repository privileges and folder Permissions.

Perform folder functions: Create, edit, copy, and delete folders. Work we perform in the Designer and Workflow Manager is stored in folders. If we want to share metadata, you can configure a folder to be shared.

View metadata:Analyze sources, targets, mappings, and shortcut dependencies, search by keyword, and view the properties of repository Objects. We create repository objects using the Designer and Workflow Manager Client tools.

We can view the following objects in the Navigator window of the Repository Manager:

Source definitions:Definitions of database objects (tables, views, synonyms) or Files that provide source data.

Target definitions:Definitions of database objects or files that contain the target data.

Mappings:A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Integration Service uses to transform and move data.

Reusable transformations:Transformations that we use in multiple mappings.

Mapplets:A set of transformations that you use in multiple mappings.

Sessions and workflows:Sessions and workflows store information about how and When the Integration Service moves data. A workflow is a set of instructions that Describes how and when to run tasks related to extracting, transforming, and loading Data. A session is a type of task that you can put in a workflow. Each session Corresponds to a single mapping.

4.Workflow Manager :Use the Workflow Manager to create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.

The Workflow Manager has the following tools to help us develop a workflow:

Task Developer:Create tasks we want to accomplish in the workflow.

Work let Designer: Create a worklet in the Worklet Designer. A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. We can nest worklets inside a workflow.

Workflow Designer:Create a workflow by connecting tasks with links in the Workflow Designer. You can also create tasks in the Workflow Designer as you develop the workflow.

When we create a workflow in the Workflow Designer, we add tasks to the workflow. The Workflow Manager includes tasks, such as the Session task, the Command task, and the Email task so you can design a workflow. The Session task is based on a mapping we build in the Designer.

We then connect tasks with links to specify the order of execution for the tasks we created. Use conditional links and workflow variables to create branches in the workflow.

5.Workflow MonitorUse the Workflow Monitor to monitor scheduled and running workflows for each Integration Service. We can view details about a workflow or task in Gantt chart view or Task view. We Can run, stop, abort, and resume workflows from the Workflow Monitor. We can view Sessions and workflow log events in the Workflow Monitor Log Viewer.

The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor continuously receives information from the Integration Service and Repository Service. It also fetches information from the repository to display historic Information.

Services Behind SceneINTEGRATION SERVICE PROCESSThe Integration Service starts an Integration Service process to run and monitor workflows. The Integration Service process accepts requests from the Power Center Client and from pmcmd. It performs the following tasks:

Manages workflow scheduling.

Locks and reads the workflow.

Reads the parameter file.

Creates the workflow log.

Runs workflow tasks and evaluates the conditional links connecting tasks.

Starts the DTM process or processes to run the session.

Writes historical run information to the repository.

Sends post-session email in the event of a DTM failure.

LOAD BALANCERThe Load Balancer is a component of the Integration Service that dispatches tasks to achieve optimal performance and scalability. When we run a workflow, the Load Balancer dispatches the Session, Command, and predefined Event-Wait tasks within the workflow.

The Load Balancer dispatches tasks in the order it receives them. When the Load Balancer needs to dispatch more Session and Command tasks than the Integration Service can run, it places the tasks it cannot run in a queue. When nodes become available, the Load Balancer dispatches tasks from the queue in the order determined by the workflow service level.

DTM PROCESSWhen the workflow reaches a session, the Integration Service process starts the DTM process. The DTM is the process associated with the session task. The DTM process performs the following tasks:

Retrieves and validates session information from the repository.

Performs pushdown optimization when the session is configured for pushdown optimization.

Adds partitions to the session when the session is configured for dynamic partitioning.

Expands the service process variables, session parameters, and mapping variables and parameters.

Creates the session log.

Validates source and target code pages.

Verifies connection object permissions.

Runs pre-session shell commands, stored procedures, and SQL.

Sends a request to start worker DTM processes on other nodes when the session is configured to run on a grid.

Creates and run mapping, reader, writer, and transformation threads to extract, transform, and load data.

Runs post-session stored procedures, SQL, and shell commands.

Sends post-session email.

PROCESSING THREADSThe DTM allocates process memory for the session and divides it into buffers. This is also known as buffer memory. The default memory allocation is 12,000,000 bytes.

The DTM uses multiple threads to process data in a session. The main DTM thread is called the master thread.

The master thread can create the following types of threads:

Mapping Threads:One mapping thread for each session.

Pre- and Post-Session Threads:One thread created.

Reader Threads:One thread for each partition

Transformation Threads:One thread for each partition

Writer Threads:One thread for each partition

CODE PAGES and DATA MOVEMENTA code page contains the encoding to specify characters in a set of one or more languages. An encoding is the assignment of a number to a character in the character set.

The Integration Service can move data in either ASCII or Unicode data movement mode. These modes determine how the Integration Service handles character data.

We choose the data movement mode in the Integration Service configuration settings. If we want to move multi byte data, choose Unicode data movement mode.

ASCII Data Movement Mode:In ASCII mode, the Integration Service recognizes 7-bit ASCII and EBCDIC characters and stores each character in a single byte.

Unicode Data Movement Mode:Use Unicode data movement mode when sources or targets use 8-bit or multi byte character sets and contain character data.

Try U R Hand's on Admin-ConsoleRepository Manager Tasks: Add domain connection information

Add and connect to a repository

Work with Power Center domain and repository connections

Search for repository objects or keywords

View object dependencies

Compare repository objects

Truncate session and workflow log entries

View user connections

Release locks

Exchange metadata with other business intelligence tools

Add a repository to the Navigator, and then configure the domain connection information when we connect to the repository.

1.Adding a Repository to the Navigator :1. In any of the Power Center Client tools, click Repository > Add.

2. Enter the name of the repository and a valid repository user name.

3. Click OK.

Before we can connect to the repository for the first time, we must configure the Connection information for the domain that the repository belongs to.

2.Configuring a Domain Connection1. In a Power Center Client tool, select the Repositories node in the Navigator.

2. Click Repository > Configure Domains to open the Configure Domains dialog box.

3. Click the Add button. The Add Domain dialog box appears.

4. Enter the domain name, gateway host name, and gateway port number.

5. Click OK to add the domain connection.

3.Connecting to a Repository1. Launch a Power Center Client tool.

2. Select the repository in the Navigator and click Repository > Connect, or double-click the repository.

3. Enter a valid repository user name and password.

4. Click Connect.

Click on more button to add, change or view domain information.

4.Viewing Object DependenciesBefore we change or delete repository objects, we can view dependencies to see the impact on other objects. For example, before you remove a session, we can find out which workflows use the session. We can view dependencies for repository objects in the Repository Manager, Workflow Manager, and Designer tools.

Steps:1. Connect to the repository.

2. Select the object of use in navigator.

3. Click Analyze and Select the dependency we want to view.

5.Validating Multiple ObjectsWe can validate multiple objects in the repository without fetching them into the workspace. We can save and optionally check in objects that change from invalid to valid status as a result of the validation. We can validate sessions, mappings, mapplets, workflows, and worklets.

Steps:1. Select the objects you want to validate.

2. Click Analyze and Select Validate

3. Select validation options from the Validate Objects dialog box

4. Click Validate.

5. Click a link to view the objects in the results group.

6.Comparing Repository ObjectsWe can compare two repository objects of the same type to identify differences between the objects. For example, we can compare two sessions to check for differences. When we compare two objects, the Repository Manager displays their attributes.

Steps:1. In the Repository Manager, connect to the repository.

2. In the Navigator, select the object you want to compare.

3. Click Edit > Compare Objects.

4. Click Compare in the dialog box displayed.7.Truncating Workflow and Session Log EntriesWhen we configure a session or workflow to archive session logs or workflow logs, the Integration Service saves those logs in local directories. The repository also creates an entry for each saved workflow log and session log. If we move or delete a session log or workflow log from the workflow log directory or session log directory, we can remove the entries from the repository.

Steps:1. In the Repository Manager, select the workflow in the Navigator window or in the Main window.

2. Choose Edit > Truncate Log. The Truncate Workflow Log dialog box appears.

3. Choose to delete all workflow and session log entries or to delete all workflow and session log entries with an end time before a particular date.

4. If you want to delete all entries older than a certain date, enter the date and time.

5. Click OK.

8.Managing User Connections and LocksIn the Repository Manager, we can view and manage the following items:

Repository object locks: The repository locks repository objects and folders by user. The repository creates different types of locks depending on the task. The Repository Service locks and unlocks all objects in the repository.

User connections:Use the Repository Manager to monitor user connections to the repository. We can end connections when necessary.

Types of locks created:1. In-use lock:Placed on objects we want to view

2. Write-intent lock:Placed on objects we want to modify.

3. Execute lock:Locks objects we want to run, such as workflows and sessions

Steps:1. Launch the Repository Manager and connect to the repository.

2. Click Edit > Show User Connections or Show locks

3. The locks or user connections will be displayed in a window.

4. We can do the rest as per our need.

9.Managing Users and Groups1. In the Repository Manager, connect to a repository.

2. Click Security > Manage Users and Privileges.

3. Click the Groups tab to create Groups. or

4. Click the Users tab to create Users

5. Click the Privileges tab to give permissions to groups and users.

6. Select the options available to add, edit, and remove users and groups.

There are two default repository user groups:

Administrators:This group initially contains two users that are created by default.

The default users are Administrator and the database user that created the repository. We cannot delete these users from the repository or remove them from the Administrators group.

Public:The Repository Manager does not create any default users in the Public group.

10 Working with FoldersWe can create, Edit or delete folder as per our need.

1. In the Repository Manager, connect to a repository.

2. Click Folder > Create.

Enter the following information:

3. Click ok.

Difference Between 7.1 and 8.61.Target from Transformation:In Informatica 8X we can create target from Transformation by dragging transformation in Target designer

2.Pushdown optimization: Uses increased performance by pushing Transformation logic to the database by analyzing the transformations and Issuing SQL statements to sources and targets. Only processes any Transformation logic that it cannot push to the database.

3.New function in expression editor: New function have been introduced in Informatica 8X like reg_extract and reg_match

4.Repository queryavailable in both versioned and non versioned Repositories previously it was available only for versioned repository.

5.UDF(User defined function) similar to macro in excel

6.FTP: We can have partitioned FTP targets and Indirect FTP File source (with file list).

7.Propagating Port Descriptions:In Informatica 8 we can edit a port description and propagate the description to other transformations in the mapping.

8.Environment SQL Enhancements:Environment SQL can still be used to Execute an SQL statement at start of connection to the database. We can Use SQL commands that depend upon a transaction being opened during The entire read or write process. For example, the following SQL command Modifies how the session handles characters: Alter session set

NLS_DATE_FORMAT='DD/MM/YYYY';".

9.Concurrently write to multiple filesin a session with partitioned targets.

10.Flat File Enhancements: Reduced conversion of data types

Delimited file performance has improved

Flat file now can have integer and double data types

Data can be appended to existing flat files

Informatica power center 8 is having the following features which makes it more powerful, easy to use and manage when compared to previous versions.

Supports Service oriented architecture

Access to structured, unstructured and semi structured data

Support for grid computing

High availability

Pushdown optimization

Dynamic partitioning

Metadata exchange enhancements

Team based Development

Global Web-based Admin console

New transformations

23 New functions

User defined functions

Custom transformation enhancements

Flat file enhancements

New Data Federation option

Enterprise GRID

Testing

Unit TestingUnit testing can be broadly classified into 2 categories.

Quantitative TestingValidate your Source and Targeta) Ensure that your connectors are configured properly.

b) If you are using flat file make sure have enough read/write permission on the file share.

c) You need to document all the connector information.

Analyze the Load Timea) Execute the session and review the session statistics.

b) Check the Read and Write counters. How long it takes to perform the load.

c) Use the session and workflow logs to capture the load statistics.

d) You need to document all the load timing information.

Analyze the success rows and rejections.a) Have customized SQL queries to check the source/targets and here we will perform the Record Count Verification.

b) Analyze the rejections and build a process to handle those rejections. This requires a clear business requirement from the business on how to handle the data rejections. Do we need to reload or reject and inform etc? Discussions are required and appropriate process must be developed.

Performance Improvementa) Network Performance

b) Session Performance

c) Database Performance

d) Analyze and if required define the Informatica and DB partitioning requirements.

Qualitative TestingAnalyze & validate your transformation business rules. More of functional testing.

e) You need review field by field from source to target and ensure that the required transformation logic is applied.

f) If you are making changes to existing mappings make use of thedata lineagefeature Available with Informatica Power Center. This will help you to find the consequences of Altering or deleting a port from existing mapping.

g) Ensure that appropriate dimension lookups have been used and your development is in Sync with your business requirements.

Integration TestingAfter unit testing is complete; it should form the basis of starting integration testing. Integration testing should

Test out initial and incremental loading of the data warehouse.

Integration testing will involve following1. Sequence of ETL jobs in batch.

2. Initial loading of records on data warehouse.

3. Incremental loading of records at a later date to verify the newly inserted or updated data.

4. Testing the rejected records that dont fulfill transformation rules.

5. Error log generation.

Integration Testing would cover End-to-End Testing for DWH. The coverage of the tests would include the below:

Count ValidationRecord Count Verification: DWH backend/Reporting queries against source and target as an initial check.

Control totals:To ensure accuracy in data entry and processing, control totals can be compared by the system with manually entered or otherwise calculated control totals using the data fields such as quantities, line items, documents, or dollars, or simple record counts

Hash totals:This is a technique for improving data accuracy, whereby totals are obtained on identifier fields (i.e., fields for which it would logically be meaningless to construct a total), such as account number, social security number, part number, or employee number. These totals have no significance other than for internal system control purposes.

Limit checks:The program tests specified data fields against defined high or low value limits (e.g., quantities or dollars) for acceptability before further processing.

Dimensional AnalysisData integrity between the various source tables and relationships.

Statistical AnalysisValidation for various calculations.

When you validate the calculations you dont require loading the entire rows into target and Validating it.

Instead you use theEnable Test Loadfeature available in Informatica Power Center.PropertyDescription

Enable Test LoadYou can configure the Integration Service to perform a test load.

With a test load, the Integration Service reads and transforms data without writing to targets. The Integration Service generates all session files, and performs all pre- and post-session Functions, as if running the full session.

The Integration Service writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the Integration Service does not write data to the targets.

Enter the number of source rows you want to test in the Number of Rows to Test field. You cannot perform a test load on sessions using XML sources. You can perform a test load for relational targets when you configure a session for normal Mode. If you configure the session for bulk mode, the session fails.

Number of Rows to TestEnter the number of source rows you want the Integration Service to test load. The Integration Service reads the number you configure for the test load.

Data Quality ValidationCheck for missing data, negatives and consistency. Field-by-Field data verification can be done to check the consistency of source and target data.

Overflow checks: This is a limit check based on the capacity of a data field or data file area to accept data. This programming technique can be used to detect the truncation of a financial or quantity data field value after computation (e.g., addition, multiplication, and division). Usually, the first digit is the one lost.

Format checks: These are used to determine that data are entered in the proper mode, as numeric or alphabetical characters, within designated fields of information. The proper mode in each case depends on the data field definition.

Sign test: This is a test for a numeric data field containing a designation of an algebraic sign, + or - , which can be used to denote, for example, debits or credits for financial data fields.

Size test: This test can be used to test the full size of the data field. For example, a social security number in the United States should have nine digits

GranularityValidate at the lowest granular level possible

Other validationsAudit Trails, Transaction Logs, Error Logs and Validity checks.

Note: Based on your project and business needs you might have additional testing requirements.

User Acceptance TestIn this phase you will involve the user to test the end results and ensure that business is satisfied with the quality of the data.

Any changes to the business requirement will follow the change management process and eventually those changes have to follow the SDLC process.

Optimize Development, Testing, and Training Systems Dramatically accelerate development and test cycles and reduce storage costs by creating fully functional, smaller targeted data subsets for development, testing, and training systems, while maintaining full data integrity.

Quickly build and update nonproduction systems with a small subset of production data and replicate current subsets of nonproduction copies faster.

Simplify test data management and shrink the footprint of nonproduction systems to significantly reduce IT infrastructure and maintenance costs.

Reduce application and upgrade deployment risks by properly testing configuration updates with up-to-date, realistic data before introducing them into production .

Easily customize provisioning rules to meet each organizations changing business requirements.

Lower training costs by standardizing on one approach and one infrastructure.

Train employees effectively using reliable, production-like data in training systems.

Support Corporate Divestitures and Reorganizations Untangle complex operational systems and separate data along business lines to quickly build the divested organizations system.

Accelerate the provisioning of new systems by using only data thats relevant to the divested organization.

Decrease the cost and time of data divestiture with no reimplementation costs .

Reduce the Total Cost of Storage Ownership Dramatically increase an IT teams productivity by reusing a comprehensive list of data objects for data selection and updating processes across multiple projects, instead of coding by handwhich is expensive, resource intensive, and time consuming .

Accelerate application delivery by decreasing R&D cycle time and streamlining test data management.

Improve the reliability of application delivery by ensuring IT teams have ready access to updated quality production data.

Lower administration costs by centrally managing data growth solutions across all packaged and custom applications.

Substantially accelerate time to value for subsets of packaged applications.

Decrease maintenance costs by eliminating custom code and scripting.

Informatica Power Center TestingDebugger: Very useful tool for debugging a valid mapping to gain troubleshooting information about data and error conditions. Refer Informatica documentation to know more about debugger tool.

Test Load Options Relational Targets.

Running the Integration Service in Safe Mode

Test a development environment.Run the Integration Service in safe mode to test a development environment before migrating to production

Troubleshoot the Integration Service.Configure the Integration Service to fail over in safe mode and troubleshoot errors when you migrate or test a production environment configured for high availability. After the Integration Service fails over in safe mode, you can correct the error that caused the Integration Service to fail over.

Syntax Testing:Test your customized queries using your source qualifier before executing the session. Performance Testing for identifying the following bottlenecks:

Target

Source

Mapping

Session

System

Use the following methods to identify performance bottlenecks:

Run test sessions. You can configure a test session to read from a flat file source or to write to a flat file target to identify source and target bottlenecks.

Analyze performance details. Analyze performance details, such as performance counters, to determine where session performance decreases.

Analyze thread statistics. Analyze thread statistics to determine the optimal number of partition points.

Monitor system performance. You can use system monitoring tools to view the percentage of CPU use, I/O waits, and paging to identify system bottlenecks. You can also use the Workflow Monitor to view system resource usage. Use Power Center conditional filter in the Source Qualifier to improve performance.

Share metadata.You can share metadata with a third party. For example, you want to send a mapping to someone else for testing or analysis, but you do not want to disclose repository connection information for security reasons. You can export the mapping to an XML file and edit the repository connection information before sending the XML file. The third party can import the mapping from the XML file and analyze the metadata.

DebuggerYou can debug a valid mapping to gain troubleshooting information about data and error conditions. To debug a mapping, you configure and run the Debugger from within the Mapping Designer. The Debugger uses a session to run the mapping on the Integration Service. When you run the Debugger, it pauses at breakpoints and you can view and edit transformation output data.

You might want to run the Debugger in the following situations:

Before you run a session.After you save a mapping, you can run some initial tests with a debug session before you create and configure a session in the Workflow Manager.

After you run a session.If a session fails or if you receive unexpected results in the target, you can run the Debugger against the session. You might also want to run the Debugger against a session if you want to debug the mapping using the configured session properties.

Debugger Session Types:You can select three different debugger session types when you configure the Debugger. The Debugger runs a workflow for each session type. You can choose from the following Debugger session types when you configure the Debugger:

Use an existing non-reusable session.The Debugger uses existing source, target, and session configuration properties. When you run the Debugger, the Integration Service runs the non-reusable session and the existing workflow. The Debugger does not suspend on error.

Use an existing reusable session.The Debugger uses existing source, target, and session configuration properties. When you run the Debugger, the Integration Service runs a debug instance of the reusable session And creates and runs a debug workflow for the session.

Create a debug session instance.You can configure source, target, and session configuration properties through the Debugger Wizard. When you run the Debugger, the Integration Service runs a debug instance of the debug workflow and creates and runs a debug workflow for the session.

Debug ProcessTo debug a mapping, complete the following steps:

1.Create breakpoints.Create breakpoints in a mapping where you want the Integration Service to evaluate data and error conditions.

2.Configure the Debugger.Use the Debugger Wizard to configure the Debugger for the mapping. Select the session type the Integration Service uses when it runs the Debugger. When you create a debug session, you configure a subset of session properties within the Debugger Wizard, such as source and target location. You can also choose to load or discard target data.

3.Run the Debugger.Run the Debugger from within the Mapping Designer. When you run the Debugger, the Designer connects to the Integration Service. The Integration Service initializes the Debugger and runs the debugging session and workflow. The Integration Service reads the breakpoints and pauses the Debugger

when the breakpoints evaluate to true.

4.Monitor the Debugger.While you run the Debugger, you can monitor the target data, transformation and mapplet output data, the debug log, and the session log. When you run the Debugger, the Designer displays the following windows:

Debug log.View messages from the Debugger.

Target window.View target data.

Instance window.View transformation data.

5.Modify data and breakpoints.When the Debugger pauses, you can modify data and see the effect on transformations, mapplets, and targets as the data moves through the pipeline. You can also modify breakpoint information.

The Designer saves mapping breakpoint and Debugger information in the workspace files. You can copy breakpoint information and the Debugger configuration to another mapping. If you want to run the Debugger from another Power Center Client machine, you can copy the breakpoint information and the Debugger configuration to the other Power Center Client machine.

Running the Debugger:When you complete the Debugger Wizard, the Integration Service starts the session and initializes the Debugger. After initialization, the Debugger moves in and out of running and paused states based on breakpoints and commands that you issue from the Mapping Designer. The Debugger can be in one of the following states:

Initializing.The Designer connects to the Integration Service.

Running.The Integration Service processes the data.

Paused.The Integration Service encounters a break and pauses the Debugger.

Note:To enable multiple users to debug the same mapping at the same time, each user must configure different port numbers in the Tools > Options > Debug tab.

The Debugger does not use the high availability functionality.

Monitoring the Debugger :When you run the Debugger, you can monitor the following information:

Session status.Monitor the status of the session.

Data movement.Monitor data as it moves through transformations.

Breakpoints.Monitor data that meets breakpoint conditions.

Target data.Monitor target data on a row-by-row basis.

The Mapping Designer displays windows and debug indicators that help you monitor the session:

Debug indicators.Debug indicators on transformations help you follow breakpoints and data flow.

Instance window.When the Debugger pauses, you can view transformation data and row information in the Instance window.

Target window.View target data for each target in the mapping.

Output window.The Integration Service writes messages to the following tabs in the Output window:

Debugger tab.The debug log displays in the Debugger tab.

Session Log tab.The session log displays in the Session Log tab.

Notifications tab.Displays messages from the Repository Service.

While you monitor the Debugger, you might want to change the transformation output data to see the effect on subsequent transformations or targets in the data flow. You might also want to edit or add more breakpoint information to monitor the session more closely.

RestrictionsYou cannot change data for the following output ports:

Normalizer transformation.Generated Keys and Generated Column ID ports.

Rank transformation.RANKINDEX port.

Router transformation.All output ports.

Sequence Generator transformation.CURRVAL and NEXTVAL ports.

Lookup transformation.NewLookupRow port for a Lookup transformation configured to use a dynamic cache.

Custom transformation.Ports in output groups other than the current output group.

Java transformation.Ports in output groups other than the current output group.

Additionally, you cannot change data associated with the following:

Mapplets that are not selected for debugging

Input or input/output ports

Output ports when the Debugger pauses on an error breakpoint

Constraint-Based Loading:In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraint-based loading depends on the following requirements:

Active source.Related target tables must have the same active source.

Key relationships.Target tables must have key relationships.

Target connection groups.Targets must be in one target connection group.

Treat rows as insert.Use this option when you insert into the target. You cannot use updates with constraint based loading.

Active Source:When target tables receive rows from different active sources, the Integration Service reverts to normal loading for those tables, but loads all other targets in the session using constraint-based loading when possible. For example, a mapping contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets receive data from different active sources, the Integration Service reverts to normal loading for both targets. The third pipeline contains a source, Normalizer, and two targets. Since these two targets share a single active source (the Normalizer), the Integration Service performs constraint-based loading: loading the primary key table first, then the foreign key table.

Key Relationships:When target tables have no key relationships, the Integration Service does not perform constraint-based loading.

Similarly, when target tables have circular key relationships, the Integration Service reverts to a normal load. For example, you have one target containing a primary key and a foreign key related to the primary key in a second target. The second target also contains a foreign key that references the primary key in the first target. The Integration Service cannot enforce constraint-based loading for these tables. It reverts to a normal load.

Target Connection Groups:The Integration Service enforces constraint-based loading for targets in the same target connection group. If you want to specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the tables are in the same target connection group. If the tables with the primary key-foreign key relationship are in different target connection groups, the Integration Service cannot enforce constraint-based loading when you run the workflow. To verify that all targets are in the same target connection group, complete the following tasks:

Verify all targets are in the same target load order group and receive data from the same active source.

Use the default partition properties and do not add partitions or partition points.

Define the same target type for all targets in the session properties.

Define the same database connection name for all targets in the session properties.

Choose normal mode for the target load type for all targets in the session properties.

Treat Rows as Insert:Use constraint-based loading when the session option Treat Source Rows As is set to insert. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading.

When the mapping contains Update Strategy transformations and you need to load data to a primary key table first, split the mapping using one of the following options:

Load primary key table in one mapping and dependent tables in another mapping. Use constraint-based loading to load the primary table.

Perform inserts in one mapping and updates in another mapping.

Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the Integration Service reads the sources in each target load order group in the mapping. A target load order group is a collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint based loading establishes the order in which the Integration Service loads individual targets within a set of targets receiving data from a single source qualifier.

ExampleThe following mapping is configured to perform constraint-based loading:

In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key.

Since these tables receive records from a single active source, SQ_A, the Integration Service loads rows to the target in the following order:

1. T_1

2. T_2 and T_3 (in no particular order)

3. T_4

The Integration Service loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by T_2 and T_3. The Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The Integration Service loads T_4 last, because it has a foreign key that references a primary key in T_3.After loading the first set of targets, the Integration Service begins reading source B. If there are no key relationships between T_5 and T_6, the Integration Service reverts to a normal load for both targets.

If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source, the Aggregator AGGTRANS, the Integration Service loads rows to the tables in the following order:

T_5

T_6

T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and you use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same database connection for each target and you use the default partition properties. The Integration Service includes T_5 and T_6 in a different target connection group because they are in a different target load order group from the first four targets.

Enabling Constraint-Based Loading:When you enable constraint-based loading, the Integration Service orders the target load on a row-by-row basis. To enable constraint-based loading:

1. In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property.

2. Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering.

3. Click OK.

Target Load OrderWhen you use a mapplet in a mapping, the Mapping Designer lets you set the target load plan for sources within the mapplet.

Setting the Target Load OrderYou can configure the target load order for a mapping containing any type of target definition. In the Designer, you can set the order in which the Integration Service sends rows to targets in different target load order groups in a mapping. A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. You can set the target load order if you want to maintain referential integrity when inserting, deleting, or updating tables that have the primary key and foreign key constraints.

The Integration Service reads sources in a target load order group concurrently, and it processes target load order groups sequentially.

To specify the order in which the Integration Service sends data to targets, create one source qualifier for each target within a mapping. To set the target load order, you then determine in which order the Integration Service reads each source in the mapping.

The following figure shows two target load order groups in one mapping:

In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and T_ITEMS. The second target load order group includes all other objects in the mapping, including the TOTAL_ORDERS target. The Integration Service processes the first target load order group, and then the second target load order group.

When it processes the second target load order group, it reads data from both sources at the same time.

To set the target load order:1. Create a mapping that contains multiple target load order groups.

2. Click Mappings > Target Load Plan.

3. The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping and the targets that receive data from each source qualifier.

4. Select a source qualifier from the list.

5. Click the Up and Down buttons to move the source qualifier within the load order.

6. Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK.

Advanced Concepts

MAPPING PARAMETERS & VARIABLESMapping parameters and variables represent values in mappings and mapplets.

When we use a mapping parameter or variable in a mapping, first we declare the mapping parameter or variable for use in each mapplet or mapping. Then, we define a value for the mapping parameter or variable before we run the session.

MAPPING PARAMETERS A mapping parameter represents a constant value that we can define before running a session.

A mapping parameter retains the same value throughout the entire session.

Example: When we want to extract records of a particular month during ETL process, we will create a Mapping Parameter of data type and use it in query to compare it with the timestamp field in SQL override.

After we create a parameter, it appears in the Expression Editor.

We can then use the parameter in any expression in the mapplet or mapping.

We can also use parameters in a source qualifier filter, user-defined join, or extract override, and in the Expression Editor of reusable transformations.

MAPPING VARIABLES Unlike mapping parameters, mapping variables are values that can change between sessions.

The Integration Service saves the latest value of a mapping variable to the repository at the end of each successful session.

We can override a saved value with the parameter file.

We can also clear all saved values for the session in the Workflow Manager.

We might use a mapping variable to perform an incremental read of the source. For example, we have a source table containing time stamped transactions and we want to evaluate the transactions on a daily basis. Instead of manually entering a session override to filter source data each time we run the session, we can create a mapping variable, $$IncludeDateTime. In the source qualifier, create a filter to read only rows whose transaction date equals $$IncludeDateTime, such as:

TIMESTAMP = $$IncludeDateTime

In the mapping, use a variable function to set the variable value to increment one day each time the session runs. If we set the initial value of $$IncludeDateTime to 8/1/2004, the first time the Integration Service runs the session, it reads only rows dated 8/1/2004. During the session, the Integration Service sets $$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to the repository at the end of the session. The next time it runs the session, it reads only rows from August 2, 2004.

Used in following transformations: Expression

Filter

Router

Update Strategy

Initial and Default Value:When we declare a mapping parameter or variable in a mapping or a mapplet, we can enter an initial value. When the Integration Service needs an initial value, and we did not declare an initial value for the parameter or variable, the Integration Service uses a default value based on the data type of the parameter or variable.

Data ->Default Value

Numeric ->0

String ->Empty String

Date time ->1/1/1

Variable Values:Start value and current value of a mapping variable

Start Value:The start value is the value of the variable at the start of the session. The Integration Service looks for the start value in the following order:

1. Value in parameter file

2. Value saved in the repository

3. Initial value

4. Default value

Current Value:The current value is the value of the variable as the session progresses. When a session starts, the current value of a variable is the same as the start value. The final current value for a variable is saved to the repository at the end of a successful session. When a session fails to complete, the Integration Service does not update the value of the variable in the repository.

Note:If a variable function is not used to calculate the current value of a mapping variable, the start value of the variable is saved to the repository.

Variable Data type and Aggregation Type When we declare a mapping variable in a mapping, we need to configure the Data type and aggregation type for the variable. The IS uses the aggregate type of a Mapping variable to determine the final current value of the mapping variable.

Aggregation types are:

Count:Integer and small integer data types are valid only.

Max:All transformation data types except binary data type are valid.

Min:All transformation data types except binary data type are valid.

Variable FunctionsVariable functions determine how the Integration Service calculates the current value of a mapping variable in a pipeline.

SetMaxVariable:Sets the variable to the maximum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Max.

SetMinVariable:Sets the variable to the minimum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Min.

SetCountVariable:Increments the variable value by one. It adds one to the variable value when a row is marked for insertion, and subtracts one when the row is Marked for deletion. It ignores rows marked for update or reject. Aggregation type set to Count.

SetVariable:Sets the variable to the configured value. At the end of a session, it compares the final current value of the variable to the start value of the variable. Based on the aggregate type of the variable, it saves a final value to the repository.

Creating Mapping Parameters and Variables1. Open the folder where we want to create parameter or variable.

2. In the Mapping Designer, click Mappings > Parameters and Variables. -or- In the Mapplet Designer, click Mapplet > Parameters and Variables.

3. Click the add button.

4. Enter name. Do not remove $$ from name.

5. Select Type and Data type. Select Aggregation type for mapping variables.

6. Give Initial Value. Click ok.

Example: Use of Mapping of Mapping Parameters and Variables

EMP will be source table.

Create a target table MP_MV_EXAMPLE having columns: EMPNO, ENAME, DEPTNO, TOTAL_SAL, MAX_VAR, MIN_VAR, COUNT_VAR and SET_VAR.

TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that changes every month)

SET_VAR: We will be added one month to the HIREDATE of every employee.

Create shortcuts as necessary.

Creating Mapping1. Open folder where we want to create the mapping.

2. Click Tools -> Mapping Designer.

3. Click Mapping-> Create-> Give name. Ex: m_mp_mv_example

4. Drag EMP and target table.

5. Transformation -> Create -> Select Expression for list -> Create > Done.

6. Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to Expression.

7. Create Parameter $$Bonus and Give initial value as 200.

8. Create variable $$var_max of MAX aggregation type and initial value 1500.

9. Create variable $$var_min of MIN aggregation type and initial value 1500.

10. Create variable $$var_count of COUNT aggregation type and initial value 0. COUNT is visible when datatype is INT or SMALLINT.

11. Create variable $$var_set of MAX aggregation type.

12. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR,

out_COUNT_VAR and out_SET_VAR.

13. Open expression editor for TOTAL_SAL. Do the same as we did earlier for SAL+ COMM. To add $$BONUS to it, select variable tab and select the parameter from mapping parameter. SAL + COMM + $$Bonus

14. Open Expression editor for out_max_var.

15. Select the variable function SETMAXVARIABLE from left side pane. Select

$$var_max from variable tab and SAL from ports tab as shown below.SETMAXVARIABLE($$var_max,SAL)

17. Open Expression editor for out_min_var and write the following expression:

SETMINVARIABLE($$var_min,SAL). Validate the expression.

18. Open Expression editor for out_count_var and write the following expression:

SETCOUNTVARIABLE($$var_count). Validate the expression.

19. Open Expression editor for out_set_var and write the following expression:

SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate.

20. Click OK. Expression Transformation below:

21. Link all ports from expression to target and Validate Mapping and Save it.

22. See mapping picture on next page.

PARAMETER FILE A parameter file is a list of parameters and associated values for a workflow, worklet, or session.

Parameter files provide flexibility to change these variables each time we run a workflow or session.

We can create multiple parameter files and change the file we use for a session or workflow. We can create a parameter file using a text editor such as WordPad or Notepad.

Enter the parameter file name and directory in the workflow or session properties.

A parameter file contains the following types of parameters and variables:

Workflow variable:References values and records information in a workflow.

Worklet variable:References values and records information in a worklet. Use predefined worklet variables in a parent workflow, but we cannot use workflow variables from the parent workflow in a worklet.

Session parameter:Defines a value that can change from session to session, such as a database connection or file name.

Mapping parameter and Mapping variable

USING A PARAMETER FILEParameter files contain several sections preceded by a heading. The heading identifies the Integration Service, Integration Service process, workflow, worklet, or session to which we want to assign parameters or variables.

Make session and workflow.

Give connection information for source and target table.

Run workflow and see result.

Sample Parameter File for Our example:In the parameter file, folder and session names are case sensitive.

Create a text file in notepad with name Para_File.txt

[Practice.ST:s_m_MP_MV_Example]

$$Bonus=1000

$$var_max=500

$$var_min=1200

$$var_count=0

CONFIGURING PARAMTER FILEWe can specify the parameter file name and directory in the workflow or session properties.

To enter a parameter file in the workflow properties:

1. Open a Workflow in the Workflow Manager.

2. Click Workflows > Edit.

3. Click the Properties tab.

4. Enter the parameter directory and name in the Parameter Filename field.

5. Click OK.

To enter a parameter file in the session properties:

1. Open a session in the Workflow Manager.

2. Click the Properties tab and open the General Options settings.

3. Enter the parameter directory and name in the Parameter Filename field.

4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt

5. Click OK.

MAPPLETS A mapplet is a reusable object that we create in the Mapplet Designer.

It contains a set of transformations and lets us reuse that transformation logic in multiple mappings.

Created in Mapplet Designer in Designer Tool.

We need to use same set of 5 transformations in say 10 mappings. So instead of making 5 transformations in every 10 mapping, we create a mapplet of these 5 transformations. Now we use this mapplet in all 10 mappings. Example: To create a surrogate key in target. We create a mapplet using a stored procedure to create Primary key for target table. We give target table name and key column name as input to mapplet and get the Surrogate key as output.

Mapplets help simplify mappings in the following ways:

Include source definitions: Use multiple source definitions and source qualifiers to provide source data for a mapping.

Accept data from sources in a mapping

Include multiple transformations: As many transformations as we need.

Pass data to multiple transformations: We can create a mapplet to feed data to multiple transformations. Each Output transformation in a mapplet represents one output group in a mapplet.

Contain unused ports: We do not have to connect all mapplet input and output ports in a mapping.

Mapplet Input:Mapplet input can originate from a source definition and/or from an Input transformation in the mapplet. We can create multiple pipelines in a mapplet.

We use Mapplet Input transformation to give input to mapplet.

Use of Mapplet Input transformation is optional.

Mapplet Output:The output of a mapplet is not connected to any target table.

We must use Mapplet Output transformation to store mapplet output.

A mapplet must contain at least one Output transformation with at least one connected port in the mapplet.

Example1:We will join EMP and DEPT table. Then calculate total salary. Give the output to mapplet out transformation.

EMP and DEPT will be source tables.

Output will be given to transformation Mapplet_Out.

Steps:1. Open folder where we want to create the mapping.

2. Click Tools -> Mapplet Designer.

3. Click Mapplets-> Create-> Give name. Ex: mplt_example1

4. Drag EMP and DEPT table.

5. Use Joiner transformation as described earlier to join them.

6. Transformation -> Create -> Select Expression for list -> Create -> Done

7. Pass all ports from joiner to expression and then calculate total salary as described in expression transformation.

8. Now Transformation -> Create -> Select Mapplet Out from list > Create -> Give name and then done.

9. Pass all ports from expression to Mapplet output.

10. Mapplet -> Validate

11. Repository -> Save

Use of mapplet in mapping: We can mapplet in mapping by just dragging the mapplet from mapplet folder on left pane as we drag source and target tables.

When we use the mapplet in a mapping, the mapplet object displays only the ports from the Input and Output transformations. These are referred to as the mapplet input and mapplet output ports.

Make sure to give correct connection information in session.

Making a mapping:We will use mplt_example1, and then create a filter

transformation to filter records whose Total Salary is >= 1500.

mplt_example1 will be source.

Create target table same as Mapplet_out transformation as in picture above. Creating Mapping

1. Open folder where we want to create the mapping.

2. Click Tools -> Mapping Designer.

3. Click Mapping-> Create-> Give name. Ex: m_mplt_example1

4. Drag mplt_Example1 and target table.

5. Transformation -> Create -> Select Filter for list -> Create -> Done.

6. Drag all ports from mplt_example1 to filter and give filter condition.

7. Connect all ports from filter to target. We can add more transformations after filter if needed.

8. Validate mapping and Save it.

Make session and workflow.

Give connection information for mapplet source tables.

Give connection information for target table.

Run workflow and see result.

PARTITIONING A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source qualifier.

When the Integration Service runs the session, it can achieve higher Performance by partitioning the pipeline and performing the extract, Transformation, and load for each partition in parallel.

A partition is a pipeline stage that executes in a single reader, transformation, or Writer thread. The number of partitions in any pipeline stage equals the number of Threads in the stage. By default, the Integration Service creates one partition in every pipeline stage.

PARTITIONING ATTRIBUTES1. Partition points By default, IS sets partition points at various transformations in the pipeline.

Partition points mark thread boundaries and divide the pipeline into stages.

A stage is a section of a pipeline between any two partition points.

2. Number of Partitions we can define up to 64 partitions at any partition point in a pipeline.

When we increase or decrease the number of partitions at any partition point, the Workflow Manager increases or decreases the number of partitions at all Partition points in the pipeline.

increasing the number of partitions or partition points increases the number of threads.

The number of partitions we create equals the number of connections to the source or target. For one partition, one database connection will be used.

3. Partition types The Integration Service creates a default partition type at each partition point.

If we have the Partitioning option, we can change the partition type. This option is purchased separately.

The partition type controls how the Integration Service distributes data among partitions at partition points.

PARTITIONING TYPES1. Round Robin Partition Type In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions.

Each partition processes approximately the same number of rows.

Use round-robin partitioning when we need to distribute rows evenly and do not need to group data among partitions.

2. Pass-Through Partition Type In pass-through partitioning, the Integration Service processes data without Redistributing rows among partitions.

All rows in a single partition stay in that partition after crossing a pass-Through partition point.

Use pass-through partitioning when we want to increase data throughput, but we do not want to increase the number of partitions.

3. Database Partitioning Partition Type Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only.

Use any number of pipeline partitions and any number of database partitions.

We can improve performance when the number of pipeline partitions equals the number of database partitions.

Database Partitioning with One SourceWhen we use database partitioning with a source qualifier with one source, the Integration Service generates SQL queries for each database partition and distributes the data from the database partitions among the session partitions Equally.

For example, when a session has three partitions and the database has five partitions, 1stand 2ndsession partitions will receive data from 2 database partitions each. Thus four DB partitions used. 3rdSession partition will receive Data from the remaining 1 DB partition.

Partitioning a Source Qualifier with Multiple Sources TablesThe Integration Service creates SQL queries for database partitions based on the Number of partitions in the database table with the most partitions.

If the session has three partitions and the database table has two partitions, one of the session partitions receives no data.

4. Hash Auto-Keys Partition Type The Integration Service uses all grouped or sorted ports as a compound Partition key.

Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and Unsorted Aggregator transformations to ensure that rows are grouped Properly before they enter these transformations.

5. Hash User-Keys Partition Type The Integration Service uses a hash function to group rows of data among Partitions.

we define the number of ports to generate the partition key.

we choose the ports that define the partition key .

6. Key range Partition Type We specify one or more ports to form a compound partition key.

The Integration Service passes data to each partition depending on the Ranges we specify for each port.

Use key range partitioning where the sources or targets in the pipeline are Partitioned by key range.

Example:Customer 1-100 in one partition, 101-200 in another and so on. We Define the range for each partition.

WORKING WITH LINKS Use links to connect each workflow task.

We can specify conditions with links to create branches in the workflow.

The Workflow Manager does not allow us to use links to create loops in the workflow. Each link in the workflow can run only once.

Valid Workflow :

Example of loop:

Specifying Link Conditions: Once we create links between tasks, we can specify conditions for each link to determine the order of execution in the workflow.

If we do not specify conditions for each link, the Integration Service runs the next task in the workflow by default.

Use predefined or user-defined workflow variables in the link condition.

Steps:1. In the Workflow Designer workspace, double-click the link you want to specify.

2. The Expression Editor appears.

3. In the Expression Editor, enter the link condition. The Expression Editor provides predefined workflow variables, user-defined workflow variables, variable functions, and Boolean and arithmetic operators.

4. Validate the expression using the Validate button.

Using the Expression Editor:The Workflow Manager provides an Expression Editor for any expressions in the workflow. We can enter expressions using the Expression Editor for the following:

Link conditions

Decision task

Assignment task

SCHEDULERSWe can schedule a workflow to run continuously, repeat at a given time or interval, or we can manually start a workflow. The Integration Service runs a scheduled workflow as configured.

By default, the workflow runs on demand. We can change the schedule settings by editing the scheduler. If we change schedule settings, the Integration Service reschedules the workflow according to the new settings.

A scheduler is a repository object that contains a set of schedule settings.

Scheduler can be non-reusable or reusable.

The Workflow Manager marks a workflow invalid if we delete the scheduler associated with the workflow.

If we choose a different Integration Service for the workflow or restart the Integration Service, it reschedules all workflows.

If we delete a folder, the Integration Service removes workflows from the schedule.

The Integration Service does not run the workflow if:

The prior workflow run fails.

We remove the workflow from the schedule

The Integration Service is running in safe mode

Creating a Reusable Scheduler For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse the same set of scheduling settings for workflows in the folder.

Use a reusable scheduler so we do not need to configure the same set of scheduling settings in each workflow.

When we delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the workflows valid, we must edit them and replace the missing scheduler.

Steps:1. Open the folder where we want to create the scheduler.

2. In the Workflow Designer, click Workflows > Schedulers.

3. Click Add to add a new scheduler.

4. In the General tab, enter a name for the scheduler.

5. Configure the scheduler settings in the Scheduler tab.

6. Click Apply and OK.

Configuring Scheduler SettingsConfigure the Schedule tab of the scheduler to set run options, schedule options, start options, and end options for the schedule.

There are 3 run options:

1. Run on Demand

2. Run Continuously

3. Run on Server initialization

1. Run on Demand:Integration Service runs the workflow when we start the workflow manually.

2. Run Continuously:Integration Service runs the workflow as soon as the service initializes. The Integration Service then starts the next run of the workflow as soon as it finishes the previous run.

3. Run on Server initializationIntegration Service runs the workflow as soon as the service is initialized. The Integration Service then starts the next run of the workflow according to settings in Schedule Options.

Schedule options for Run on Server initialization:

Run Once:To run the workflow just once.

Run every:Run the workflow at regular intervals, as configured.

Customized Repeat:Integration Service runs the workflow on the dates and times specified in the Repeat dialog box.

Start options for Run on Server initialization:

Start Date

Start Time

End options for Run on Server initialization:

End on:IS stops scheduling the workflow in the selected date.

End After:IS stops scheduling the workflow after the set number of

workflow runs.

Forever:IS schedules the workflow as long as the workflow does not fail.

Creating a Non-Reusable Scheduler1. In the Workflow Designer, open the workflow.

2. Click Workflows > Edit.

3. In the Scheduler tab, choose Non-reusable. Select Reusable if we want to select an existing reusable scheduler for the workflow.

4. Note: If we do not have a reusable scheduler in the folder, we must

5. create one before we choose Reusable.

6. Click the right side of the Scheduler field to edit scheduling settings for the non- reusable scheduler

7. If we select Reusable, choose a reusable scheduler from the Scheduler

8. Browser dialog box.

9. Click Ok.

Points to Ponder : To remove a workflow from its schedule, right-click the workflow in the Navigator window and choose Unscheduled Workflow.

To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window and choose Schedule Workflow.

WORKING WITH TASKS Part 1The Workflow Manager contains many types of tasks to help you build workflows and worklets. We can create reusable tasks in the Task Developer.

Types of tasks:Task TypeTool where task can be createdReusable or not

SessionTask DeveloperYes

EmailWorkflow DesignerYes

CommandWorklet DesignerYes

Event-RaiseWorkflow DesignerNo

Event-WaitWorklet DesignerNo

TimerNo

DecisionNo

AssignmentNo

ControlNo

SESSION TASK A session is a set of instructions that tells the Power Center Server how and when to move data from sources to targets.

To run a session, we must first create a workflow to contain the Session task.

We can run as many sessions in a workflow as we need. We can run the Session tasks sequentially or concurrently, depending on our needs.

The Power Center Server creates several files and in-memory caches depending on the transformations and options used in the session.

EMAIL TASK The Workflow Manager provides an Email task that allows us to send email during a workflow.

Created by Administrator usually and we just drag and use it in our mapping.

Steps:1. In the Task Developer or Workflow Designer, choose Tasks-Create.

2. Select an Email task and enter a name for the task. Click Create.

3. Click Done.

4. Double-click the Email task in the workspace. The Edit Tasks dialog box appears.

5. Click the Properties tab.

6. Enter the fully qualified email address of the mail recipient in the Email User Name field.

7. Enter the subject of the email in the Email Subject field. Or, you can leave this field blank.

8. Click the Open button in the Email Text field to open the Email Editor.

9. Click OK twice to save your changes.

Example:To send an email when a session completes:

Steps:1. Create a workflow wf_sample_email

2. Drag any session task to workspace.

3. Edit Session task and go to Components tab.

4. See On Success Email Option there and configure it.

5. In Type select reusable or Non-reusable.

6. In Value, select the email task to be used.

7. Click Apply -> Ok.

8. Validate workflow and Repository -> Save

We can also drag the email task and use as per need.

We can set the option to send email on success or failure in components tab of a session task.

COMMAND TASKThe Command task allows us to specify one or more shell commands in UNIX or DOS commands in Windows to run during the workflow.

For example, we can specify shell commands in the Command task to delete reject files, copy a file, or archive target files.

Ways of using command task:

1.Standalone Command task:We can use a Command task anywhere in the workflow or worklet to run shell commands.

2. Pre- and post-session shell command:We can call a Command task as the pre- or post-session shell command for a Session task. This is done in COMPONENTS TAB of a session. We can run it in Pre-Session Command or Post Session Success Command or Post Session Failure Command. Select the Value and Type option as we did in Email task.

Example:to copy a file sample.txt from D drive to E.

Command:COPY D:\sample.txt E:\ in windows

Steps for creating command task:1. In the Task Developer or Workflow Designer, choose Tasks-Create.

2. Select Command Task for the task type.

3. Enter a name for the Command task. Click Create. Then click done.

4. Double-click the Command task. Go to commands tab.

5. In the Commands tab, click the Add button to add a command.

6. In the Name field, enter a name for the new command.

7. In the Command field, click the Edit button to open the Command Editor.

8. Enter only one command in the Command Editor.

9. Click OK to close the Command Editor.

10. Repeat steps 5-9 to add more commands in the task.

11. Click OK.

Steps to create the workflow using command task:1. Create a task using the above steps to copy a file in Task Developer.

2. Open Workflow Designer. Workflow -> Create -> Give name and click ok.

3. Start is displayed. Drag session say s_m_Filter_example and command task.

4. Link Start to Session task and Session to Command Task.

5. Double click link between Session and Command and give condition in editor as

6. $S_M_FILTER_EXAMPLE.Status=SUCCEEDED

7. Workflow-> Validate

8. Repository > Save

WORKING WITH EVENT TASKSWe can define events in the workflow to specify the sequence of task execution.

Types of Events: Pre-defined event:A pre-defined event is a file-watch event. This event Waits for a specified file to arrive at a given location.

User-defined event:A user-defined event is a sequence of tasks in the Workflow. We create events and then raise them as per need.

Steps for creating User Defined Event:1. Open any workflow where we want to create an event.

2. Click Workflow-> Edit -> Events tab.

3. Click to Add button to add events and give the names as per need.

4. Click Apply -> Ok. Validate the workflow and Save it.

Types of Events Tasks: EVENT RAISE:Event-Raise task represents a user-defined event. We use this task to raise a user defined event.

EVENT WAIT:Event-Wait task waits for a file watcher event or user defined event to occur before executing the next session in the workflow.

Example1:Use an event wait task and make sure that session s_filter_example runs when abc.txt file is present in D:\FILES folder.

Steps for creating workflow:1. Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.

2. Task -> Create -> Select Event Wait. Give name. Click create and done.

3. Link Start to Event Wait task.

4. Drag s_filter_example to workspace and link it to event wait task.

5. Right click on event wait task and click EDIT -> EVENTS tab.

6. Select Pre Defined option there. In the blank space, give directory and filename to watch. Example: D:\FILES\abc.tct

7. Workflow validate and Repository Save.

Example 2:Raise a user defined event when session s_m_filter_example succeeds. Capture this event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE

Steps for creating workflow:1. Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.

2. Workflow -> Edit -> Events Tab and add events EVENT1 there.

3. Drag s_m_filter_example and link it to START task.

4. Click Tasks -> Create -> Select EVENT RAISE from list. Give name

5. ER_Example. Click Create and then done.Link ER_Example to s_m_filter_example.

6. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User Defined Event and Select EVENT1 from the list displayed. Apply -> OK.

7. Click link between ER_Example and s_m_filter_example and give the condition $S_M_FILTER_EXAMPLE.Status=SUCCEEDED

8. Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT. Click Create and then done.

9. Link EW_WAIT to START task.

10. Right click EW_WAIT -> EDIT-> EVENTS tab.

11. Select User Defined there. Select the Event1 by clicking Browse Events button.

12. Apply -> OK.

13. Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.

14. Mapping -> Validate

15. Repository -> Save.

16. Run workflow and see.

WORKING WITH TASKS Part 2TIMER TASKThe Timer task allows us to specify the period of time to wait before the Power Center Server runs the next task in the workflow. The Timer task has two types of settings:

Absolute time:We specify the exact date and time or we can choose a user-defined workflow variable to specify the exact time. The next task in workflow will run as per the date and time specified.

Relative time:We instruct the Power Center Server to wait for a specified period of time after the Timer task, the parent workflow, or the top-level workflow starts.

Example:Run session s_m_filter_example relative to 1 min after the timer task.

Steps for creating workflow:1. Workflow -> Create -> Give name wf_timer_task_example -> Click ok.

2. Click Tasks -> Create -> Select TIMER from list. Give name TIMER_Example. Click Create and then done.

3. Link TIMER_Example to START task.

4. Right click TIMER_Example-> EDIT -> TIMER tab.

5. Select Relative Time Option and Give 1 min and Select From start time of this task Option.

6. Apply -> OK.

7. Drag s_m_filter_example and link it to TIMER_Example.

8. Workflow-> Validate and Repository -> Save.

DECISION TASK The Decision task allows us to enter a condition that determines the execution of the workflow, similar to a link condition.

The Decision task has a pre-defined variable called $Decision_task_name.condition that represents the result of the decision condition.

The Power Center Server evaluates the condition in the Decision task and sets the pre-defined condition variable to True (1) or False (0).

We can specify one decision condition per Decision task.

Example:Command Task should run only if either s_m_filter_example or

S_M_TOTAL_SAL_EXAMPLE succeeds. If any of s_m_filter_example or

S_M_TOTAL_SAL_EXAMPLE fails then S_m_sample_mapping_EMP should run.

Steps for creating workflow:1. Workflow -> Create -> Give name wf_decision_task_example -> Click ok.

2. Drag s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE to workspace and link both of them to START task.

3. Click Tasks -> Create -> Select DECISION from list. Give name DECISION_Example. Click Create and then done. Link DECISION_Example to both s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE.

4. Right click DECISION_Example-> EDIT -> GENERAL tab.

5. Set Treat Input Links As to OR. Default is AND. Apply and click OK.

6. Now edit decision task again and go to PROPERTIES Tab. Open the Expression editor by clicking the VALUE section of Decision Name attribute and enter the following condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED OR $S_M_TOTAL_SAL_EXAMPLE.Status = SUCCEEDED

7. Validate the condition -> Click Apply -> OK.

8. Drag command task and S_m_sample_mapping_EMP task to workspace and link them to DECISION_Example task.

9. Double click link between S_m_sample_mapping_EMP & DECISION_Example & give the condition: $DECISION_Example.Condition = 0. Validate & click OK.

10. Double click link between Command task and DECISION_Example and give the condition: $DECISION_Example.Condition = 1. Validate and click OK.

11. Workflow Validate and repository Save.

12. Run workflow and see the result.

CONTROL TASK We can use the Control task to stop, abort, or fail the top-level workflow or the parent workflow based on an input link condition.

A parent workflow or worklet is the workflow or worklet that contains the Control task.

We give the condition to the link connected to Control Task.

Control OptionDescription

Fail MeFails the control task.

Fail ParentMarks the status of the WF or worklet that contains the

Control task as failed.

Stop ParentStops the WF or worklet that contains the Control task.

Abort ParentAborts the WF or worklet that contains the Control task.

Fail Top-Level WFFails the workflow that is running.

Stop Top-Level WFStops the workflow that is running.

Abort Top-Level WFAborts the workflow that is running.

Example:Drag any 3 sessions and if anyone fails, then Abort the top level workflow.

Steps for creating workflow:1. Workflow -> Create -> Give name wf_control_task_example -> Click ok.

2. Drag any 3 sessions to workspace and link all of them to START task.

3. Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task.

4. Click Create and then done.

5. Link all sessions to the control task cntr_task.

6. Double click link between cntr_task and any session say s_m_filter_example and give the condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED.

7. Repeat above step for remaining 2 sessions also.

8. Right click cntr_task-> EDIT -> GENERAL tab. Set Treat Input Links As to OR. Default is AND.

9. Go to PROPERTIES tab of cntr_task and select the value Fail top level

10. Workflow for Control Option. Click Apply and OK.

11. Workflow Validate and repository Save.

12. Run workflow and see the result.

ASSIGNMENT TASK The Assignment task allows us to assign a value to a user-defined workflow variable.

See Workflow variable topic to add user defined variables.

To use an Assignment task in the workflow, first create and add the

Assignment task to the workflow. Then configure the Assignment task to assign values or expressions to user-defined variables.

We cannot assign values to pre-defined workflow.

Steps to create Assignment Task:1. Open any workflow where we want to use Assignment task.

2. Edit Workflow and add user defined variables.

3. Choose Tasks-Create. Select Assignment Task for the task type.

4. Enter a name for the Assignment task. Click Create. Then click Done.

5. Double-click the Assignment task to open the Edit Task dialog box.

6. On the Expressions tab, click Add to add an assignment.

7. Click the Open button in the User Defined Variables field.

8. Select the variable for which you want to assign a value. Click OK.

9. Click the Edit button in the Expression field to open the Expression Editor.

10. Enter the value or expression you want to assign.

11. Repeat steps 7-10 to add more variable assignments as necessary.

12. Click OK.

INDIRECT LOADING FOR FLAT FILESSuppose, you have 10 flat files of same structure. All the flat files ha