IDQ Learning

Embed Size (px)

DESCRIPTION

Informatica Data Quality (IDQ) Installation

Citation preview

General:

n standard Data Profiling and Data Quality projects, can anyone please clarify in what sequence in the project lifecycle one would use Data Explorer, Data Quality and Data Director?My understanding is1) Data Profiling - For discovering the data and potential anomalies.2) Data Quality - Outputs from data profiling stage implemented as data quality rules.3) Data Director - Used to correct the data based on the DQ output. Can this tool be used to correct the data on the source system directly?

Implementation logic mentioned by Robert is exactly correct. However I would like to add some more additional points which makes you even clearer regarding the IDQ process in ETL flow.The outcome of Exception management process is to clean up any database table of bad records (outliers). The output of the Exception management process should be a clean database table that can be sourced directly into the ETL flow and is expected to be of high data quality. When you design a process flow for Exception management, you cannot directly use the bad records table itself as the source in the ETL flow. You need to create a separate process to copy data into the actual source table from the bad table using the "status code" column information. There is no inherent process to change data in bad table and have it automatically update the Source table.The status codes column of the exception table have the following meaning:UPDATE = 20REPROCESS = 21ACCEPT = 22MERGED = 23REMERGED = 24EXTRACTED = 25REJECT = 26Hope this helps.

I recently got Analyst access and I'm just browsing through different options. I need help on Scorecard as of now.1. I did a sample profile for 200 rows from relational DB as source. When I run the scorecard on this profile, I expect results for these 200 rows only. But in turn it runs scores on complete table I believe and so it takes hell lot of time to generate scorecard. Any workaround on this to use only 200 rows for generating scorecard?2. Can scorecard be run on the background?3. Can scorecard be viewed by team members who do not have access to Analyst using hyperlink or something like this?

1. Scorecards by default runs on the complete data set of the physical data object used to create profile. To run the scorecard on only the required 200 records, follow the steps below: Create an Logical Data Object (LDO) using developer client such that the output of LDO is only the required 200 records. Create a profile on LDO. Create a scorecard on the profile created above.2. Scorecard runs on the DIS process if DIS is not enabled for "Lanuch Jobs as Separate Process". Alternatively, you can execute the profile using the command "infacmd ps execute". This command could be run through a script which executes in background on an operating systemRefer toCommand Reference Guidefor more details on the command.3. You can configure scorecard notification settings so that the Analyst tool sends emails when specific metric scores or metric group scores move across thresholds or remain in specific score ranges, such as Unacceptable, Acceptable, and Good.Notification Email message has an option for "ObjectURL" - A hyperlink to the scorecard. You need to provide the username and password to access the object.Refer to theData Explorer User Guidefor more details on the scorecard notifications and other details.

Domain:

The Informatica domain is the administrative unit for the Informatica environment. The domain is a collection of nodes that represent the machines on which the application services run. When you install the Informatica services on a machine, you install all files for all services.

Informatica has a service-oriented architecture that provides the ability to scale services and to share resources across multiple machines. The Informatica domain is the primary unit for management and administration of services.The Informatica domain can contain one or more nodes. Multiple application services can run on each node. The application service types that you can run depend on the Informatica license key generated for your organization. When you plan the domain, you must consider the number of nodes needed in the domain. You also must consider the types of application services the domain requires and the number of application services that run on each node.

You must verify that each machine in the domain meets the system requirements to run the installer and to run the application services. You must also verify that the port numbers that you specify during installation are available on the machines where you install the Informatica services (How to check whether ports are available or not?)

The domain requires a relational database to store configuration information and user account privileges and permissionsYou must verify that the databases have the disk space required by the Informatica domain and the application services.

An Informatica domain is a collection of nodes and services. A node is the logical representation of a machine in a domain. Services for the domain include the Service Manager that manages all domain operations and a set of application services that represent server-based functionality.

The following image shows an installation on multiple machines:

For more information about the Informatica domain, see the Informatica Administrator Guide.NodesGateway nodeA gateway node is any node that you configure to serve as a gateway for the domain. One node acts as the gateway at any given time. That node is called the master gateway. A gateway node can run application services, and it can serve as a master gateway node. The master gateway node is the entry point to the domain.

The Service Manager on the master gateway node performs all domain operations on the master gateway node. The Service Managers running on other gateway nodes perform limited domain operations on those nodes.(What are those limited domain tasks?)

Worker nodesA worker node is any node not configured to serve as a gateway. A worker node can run application services, but it cannot serve as a gateway. The Service Manager performs limited domain operations on a worker node.

Service ManagerThe Service Manager in the Informatica domain supports the domain and the application services. The Service Manager runs on each node in the domain.The Service Manager manages the following areas on each node in the domain:Domain supportThe Service Manager performs operations on each node to support the domain. Domain operations include authentication, authorization, and logging. The domain operations that the Service Manager performs on a node depend on the type of node. For example, the Service Manager running on the master gateway node performs all domain operations on that node. The Service Manager running on another gateway node or a worker node performs limited domain operations on that node.Application service supportThe Service Manager on each node starts the application services configured to run on that node. It starts and stops application services based on requests from Informatica clients.

Application Services

Application services represent server-based functionality. After you complete the installation, you create application services based on the license key generated for your organization. When you create an application service, you designate a node to run the service process. The service process is the run-time representation of a service running on a node. The service type determines how many service processes can run at a time.If you have the high availability option, you can run an application service on multiple nodes. If you do not have the high availability option, configure each application service to run on one node.Some application services require databases to store information processed by the application service. When you plan the Informatica domain, you also need to plan the databases required by each application service.

License Key

The license key controls the application services and the functionality that you can use.

Informatica ClientsThe clients make requests to the Service Manager or to application services.

S.NoClient NameClient TypeUsageMetadata Stores inwill be run by Comment

1Informatica DeveloperThickto create and run data objects, mappings, profiles, workflows, and virtual databasesModel repository Data Integration Service

2PowerCenter ClientThickuse to define sources and targets, create transformations and build mappings, and create workflows to run mappingsPowerCenter repository PowerCenter Integration Service

3Data Transformation StudioThickyou use to design and configure Data Transformation projectsData Transformation repository directory Data Transformation Engine

4Analyst toolwebto analyze, cleanse, integrate, and standardize data in an enterpriseModel repository Data Integration ServiceAnalyst Service runs

5Data Analyzerwebto run reports to analyze PowerCenter metadataData Analyzer repositoryData Analyzer application

6Jaspersoftwebuse to run PowerCenter Repository Reports and Metadata Manager ReportsReporting and Dashboards Service

7Metadata Managerwebto browse and analyze metadata from disparate metadata repositoriesMetadata Manager repositoryMetadata Manager Service

8Web Services Hub Consolewebuse to manage the web services you create in PowerCenterWeb Services Hub Service

Application services:Analyst ServiceThe Analyst Service is an application service that runs the Analyst tool in the Informatica domain. The Analyst Service manages the connections between service components and the users that have access to the Analyst tool.When you run profiles, scorecards, or mapping specifications in the Analyst tool, the Analyst Service connects to the Data Integration Service to perform the data integration jobs. When you work on Human tasks in the Analyst tool, the Analyst Service connects to the Data Integration Service to retrieve the task data from the Human task database.When you view, create, or delete a Model repository object in the Analyst tool, the Analyst Service connects to the Model Repository Service to access the metadata. When you view data lineage analysis on scorecards in the Analyst tool, the Analyst Service sends the request to the Metadata Manager Service to run data lineage.Note: When you create the Analyst Service, you do not associate it with any relational databases.Associated ServicesThe Analyst Service connects to other application services within the domain.When you create the Analyst Service, you can associate it with the following application services:

Data Integration ServicesYou can associate up to two Data Integration Services with the Analyst Service. The Analyst Service manages the connection to the Data Integration Service that enables users to perform data preview, mapping specification, scorecard, and profile jobs in the Analyst tool. The Analyst Service also manages the connection to the Data Integration Service that you configure to run Human tasks. When you create the Analyst Service, you provide the name of the Data Integration Services. You can associate the Analyst Service with the same Data Integration Service for all operations.

Metadata Manager ServiceThe Analyst Service manages the connection to the Metadata Manager Service that runs data lineage for scorecards in the Analyst tool. When you create the Analyst Service, you can provide the name of the Metadata Manager Service.Model Repository ServiceThe Analyst Service manages the connection to the Model Repository Service for the Analyst tool. The Analyst tool connects to the Model Repository Service to create, update, and delete Model repository objects in the Analyst tool. When you create the Analyst Service, you provide the name of the Model Repository Service

Content Management ServiceThe Content Management Service is an application service that manages reference data. A reference data object contains a set of data values that you can search while performing data quality operations on source data. The Content Management Service also compiles rule specifications into mapplets. A rule specification object describes the data requirements of a business rule in logical terms. The Content Management Service uses the Data Integration Service to run mappings to transfer data between reference tables and external data sources. The Content Management Service also provides transformations, mapping specifications, and rule specifications with the following types of reference data:

Address reference data Identity populations Probabilistic models and classifier models Reference tables

Associated ServicesThe Content Management Service connects to other application services within the domain. When you create the Content Management Service, you can associate it with the following application services:

Data Integration ServiceThe Content Management Service uses the Data Integration Service to run mappings to transfer data between reference tables and external data sources. When you create the Content Management Service, you provide the name of the Data Integration Service. You must create the Data Integration Service and Content Management Service on the same node.Model Repository ServiceThe Content Management Service connects to the Model Repository Service to store metadata for reference data objects in the Model repository. When you create the Content Management Service, you provide the name of the Model Repository Service.You can associate multiple Content Management Services with a Model Repository Service. The Model Repository Service identifies the first Content Management Service that you associate as the master Content Management Service. The master Content Management Service manages the data files for the probabilistic models and classifier models in the Model repository. (What are probabilistic models and classifier models?)

Required DatabasesThe Content Management Service requires a reference data warehouse in a relational database. When you create the Content Management Service, you must provide connection information to the reference data warehouse.Create the following database before you create the Content Management Service:Reference data warehouseStores data values for the reference table objects that you define in the Model repository. When you add data to a reference table, the Content Management Service writes the data values to a table in the reference data warehouse. You need a reference data warehouse to manage reference table data in the Analyst tool and the Developer tool.

Data Integration Service

The Data Integration Service is an application service that performs data integration jobs for the Analyst tool, the Developer tool, and external clients. When you preview or run data profiles, SQL data services, and mappings in the Analyst tool or the Developer tool, the client tool sends requests to the Data Integration Service to perform the data integration jobs. When you run SQL data services, mappings, and workflows from the command line program or an external client, the command sends the request to the Data Integration Service.Associated ServicesThe Data Integration Service connects to other application services within the domain. When you create the Data Integration Service, you can associate it with the following application service:

Model Repository ServiceThe Data Integration Service connects to the Model Repository Service to perform jobs such as running mappings, workflows, and profiles. When you create the Data Integration Service, you provide the name of the Model Repository Service.

Required DatabasesThe Data Integration Service can connect to multiple relational databases. The databases that the service can connect to depend on the license key generated for your organization. When you create the Data Integration Service, you provide connection information to the databases. Create the following databases before you create the Data Integration Service:

Data object cache databaseStores cached logical data objects and virtual tables. Data object caching enables the Data Integration Service to access pre-built logical data objects and virtual tables. You need a data object cache database to increase performance for mappings, SQL data service queries, and web service requests.Profiling warehouseStores profiling information, such as profile results and scorecard results. You need a profiling warehouse to perform profiling and data discovery.Human task databaseStores metadata for Human tasks that run in workflows. The metadata identifies users and groups who work on the Human task instances in the Analyst tool. The metadata contains user and group names and specifies the range of exceptions records or clusters in each task instance. You need a Human task database to perform exception management.

Metadata Manager Service

The Metadata Manager Service is an application service that runs the Metadata Manager web client in the Informatica domain. The Metadata Manager Service manages the connections between service components and the users that have access to Metadata Manager. When you load metadata into the Metadata Manager warehouse, the Metadata Manager Service connects to the PowerCenter Integration Service. The PowerCenter Integration Service runs workflows in the PowerCenter repository to read from metadata sources and load metadata into the Metadata Manager warehouse. When you use Metadata Manager to browse and analyze metadata, the Metadata Manager Service accesses the metadata from the Metadata Manager repository.Associated ServicesThe Metadata Manager Service connects to other application services within the domain. When you create the Metadata Manager Service, you can associate it with the following application services:

PowerCenter Integration Service

When you load metadata into the Metadata Manager warehouse, the Metadata Manager Service connects to the PowerCenter Integration Service. The PowerCenter Integration Service runs workflows in the PowerCenter repository to read from metadata sources and load metadata into the Metadata Manager warehouse. When you create the Metadata Manager Service, you provide the name of the PowerCenter Integration Service.

PowerCenter Repository Service

The Metadata Manager Service connects to the PowerCenter Repository Service to access metadata objects in the PowerCenter repository. The PowerCenter Integration Service uses the metadata objects to load metadata into the Metadata Manager warehouse. The metadata objects include sources, targets, sessions, and workflows. The Metadata Manager Service determines the associated PowerCenter Repository Service based on the PowerCenter Integration Service associated with the Metadata Manager Service.

Required DatabasesThe Metadata Manager Service requires a Metadata Manager repository in a relational database. When you create the Metadata Manager Service, you must provide connection information to the database. Create the following database before you create the Metadata Manager Service:

Metadata Manager RepositoryStores the Metadata Manager warehouse and models. The Metadata Manager warehouse is a centralized metadata warehouse that stores the metadata from metadata sources. Models define the metadata that Metadata Manager extracts from metadata sources. You need a Metadata Manager repository to browse and analyze metadata in Metadata Manager.

Model Repository Service

The Model Repository Service is an application service that manages the Model repository. The Model repository stores metadata created by Informatica clients and application services in a relational database to enable collaboration among the clients and services.When you access a Model repository object in the Developer tool, the Analyst tool, the Administrator tool, or the Data Integration Service, the client or service sends a request to the Model Repository Service. The Model Repository Service process fetches, inserts, and updates the metadata in the Model repository database tables.Note: When you create the Model Repository Service, you do not associate it with other application services.Required DatabasesThe Model Repository Service requires a Model repository in a relational database. When you create the Model Repository Service, you must provide connection information to the database. Create the following database before you create the Model Repository Service:

Model repositoryStores metadata created by Informatica clients and application services in a relational database to enable collaboration among the clients and services. You need a Model repository to store the design-time and run-time objects created by Informatica clients and application services.

PowerCenter Integration Service

The PowerCenter Integration Service is an application service that runs workflows and sessions for the PowerCenter Client. When you run a workflow in the PowerCenter Client, the client sends the requests to the PowerCenter Integration Service. The PowerCenter Integration Service connects to the PowerCenter Repository Service to fetch metadata from the PowerCenter repository, and then runs and monitors the sessions and workflows.Note: When you create the PowerCenter Integration Service, you do not associate it with any relational databases.Associated ServicesThe PowerCenter Integration Service connects to other application services within the domain. When you create the PowerCenter Integration Service, you can associate it with the following application service:

PowerCenter Repository ServiceThe PowerCenter Integration Service requires the PowerCenter Repository Service. The PowerCenter Integration Service connects to the PowerCenter Repository Service to run workflows and sessions. When you create the PowerCenter Integration Service, you provide the name of the PowerCenter Repository Service.

PowerCenter Repository Service

The PowerCenter Repository Service is an application service that manages the PowerCenter repository. The PowerCenter repository stores metadata created by the PowerCenter Client and application services in a relational database. When you access a PowerCenter repository object in the PowerCenter Client or the PowerCenter Integration Service, the client or service sends a request to the PowerCenter Repository Service. The PowerCenter Repository Service process fetches, inserts, and updates metadata in the PowerCenter repository database tables.Note: When you create the PowerCenter Repository Service, you do not associate it with other application services.

Required DatabasesThe PowerCenter Repository Service requires a PowerCenter repository in a relational database. When you create the PowerCenter Repository Service, you must provide connection information to the database. Create the following database before you create the PowerCenter Repository Service:

PowerCenter repositoryStores metadata created by the PowerCenter Client in a relational database. You need a PowerCenter repository to store objects created by the PowerCenter Client and to store objects that are run by the PowerCenter Integration Service.

Reporting Service

The Reporting Service is an application service that runs the Data Analyzer application in the Informatica domain. The Reporting Service manages the connections between service components and the users that have access to Data Analyzer. The Reporting Service stores metadata for schemas, metrics and attributes, queries, reports, user profiles, and other objects in the Data Analyzer repository. When you run reports for a data source, the Reporting Service uses the metadata in the Data Analyzer repository to retrieve the data for the report and to present the report.

Associated Services

The Reporting Service connects to other application services within the domain. When you create the Reporting Service, you can associate it with the following application services:

PowerCenter Repository ServiceThe Reporting Service connects to the PowerCenter Repository Service when you use Data Analyzer to run PowerCenter Repository Reports. When you create the Reporting Service, you can provide the name of the PowerCenter Repository Service as the reporting source.Metadata Manager ServiceThe Reporting Service connects to the Metadata Manager Service when you use Data Analyzer to run Metadata Manager Reports. When you create the Reporting Service, you can provide the name of the Metadata Manager Service as the reporting source.

Required DatabasesThe Reporting Service requires a Data Analyzer repository in a relational database. When you create the Reporting Service, you must provide connection information to the database. Create the following database before you create the Reporting Service:

Data Analyzer repositoryStores metadata for schemas, metrics and attributes, queries, reports, user profiles, and other objects. You need a Data Analyzer repository to create and run reports in Data Analyzer.

Reporting and Dashboards Service

The Reporting and Dashboards Service is an application service that runs the JasperReports application in the Informatica domain.

The Reporting and Dashboards Service stores metadata for PowerCenter Repository Reports and Metadata Manager Reports in the Jaspersoft repository. You use the PowerCenter Client or Metadata Manager to run the reports. When you run the reports, the Reporting and Dashboards Service uses the metadata in the Jaspersoft repository to retrieve the data for the report and to present the report.JasperReports is an open source reporting library that users can embed into any Java application. JasperReports Server builds on JasperReports and forms a part of the Jaspersoft Business Intelligence suite of products.

Associated ServicesThe Reporting and Dashboards Service connects to other application services within the domain.After you create the Reporting and Dashboards Service, you can associate it with the following application services:

PowerCenter Repository ServiceThe Reporting and Dashboards Service connects to the PowerCenter Repository Service when you use JasperReports to run PowerCenter Repository Reports. After you create the Reporting and Dashboards Service, you can provide the name of the PowerCenter Repository Service as the reporting source.Metadata Manager ServiceThe Reporting and Dashboards Service connects to the Metadata Manager Service when you use JasperReports to run Metadata Manager Reports. After you create the Reporting and Dashboards Service, you can provide the name of the Metadata Manager Service as the reporting source.

Required Databases

The Reporting and Dashboards Service requires a Jaspersoft repository in a relational database. When you create the Reporting and Dashboards Service, you must provide connection information to the database.Create the following database before you create the Reporting and Dashboards Service:

Jaspersoft repositoryStores metadata for PowerCenter Repository Reports and Metadata Manager Reports. You need a Jaspersoft repository to use JasperReports Server to run PowerCenter Repository Reports and Metadata Manager Reports.

Search Service

The Search Service is an application service that manages search in the Analyst tool and Business Glossary Desktop.

By default, the Search Service returns search results from a Model repository, such as data objects, mapping specifications, profiles, reference tables, rules, and scorecards. The Search Service can also return additional results. The results can include related assets, business terms, and policies. The results can include column profile results and domain discovery results from a profiling warehouse. In addition, you can perform a search based on patterns, data types, unique values, or null values.Note: When you create the Search Service, you do not associate it with any relational databases.

Associated Services

The Search Service connects to other application services within the domain.

When you create the Search Service, you can associate it with the following application services:

Analyst ServiceThe Analyst Service manages the connection to the Search Service that enables and manages searches in the Analyst tool. The Analyst Service determines the associated Search Service based on the Model Repository Service associated with the Analyst Service.Data Integration ServiceThe Search Service connects to the Data Integration Service to return column profile and domain discovery search results from the profiling warehouse associated with the Data Integration Service. The Search Service determines the associated Data Integration Service based on the Model Repository Service.Model Repository ServiceThe Search Service connects to the Model Repository Service to return search results from a Model repository. The search results can include data objects, mapping specifications, profiles, reference tables, rules, and scorecards. When you create the Search Service, you provide the name of the Model Repository Service.

Web Services Hub

The Web Services Hub Service is an application service in the Informatica domain that exposes PowerCenter functionality to external clients through web services. The Web Services Hub Service receives requests from web service clients and passes them to the PowerCenter Integration Service or PowerCenter Repository Service. The PowerCenter Integration Service or PowerCenter Repository Service processes the requests and sends a response to the Web Services Hub. The Web Services Hub sends the response back to the web service client.Note: When you create the Web Services Hub Service, you do not associate it with any relational databasesAssociated Services

The Web Services Hub Service connects to other application services within the domain. When you create the Web Services Hub Service, you can associate it with the following application services:

PowerCenter Integration ServiceThe Web Services Hub Service connects to the PowerCenter Integration Service to send requests from web service clients to the PowerCenter Integration Service. The Web Services Hub Service determines the associated PowerCenter Integration Service based on the PowerCenter Repository Service.PowerCenter Repository ServiceThe Web Services Hub Service connects to the PowerCenter Repository Service to send requests from web service clients to the PowerCenter Repository Service. When you create the Web Services Hub Service, you provide the name of the PowerCenter Repository Service.

Databases:

Domain configuration repository - INFA_DOMAINMust have permissions to create and drop tables, indexes, and views, and to select, insert, update, and delete data from tablesThe domain stores configuration and user information in a domain configuration repository.

Data Analyzer repository

The Data Analyzer repository stores metadata for schemas, metrics and attributes, queries, reports, user profiles, and other objects for the Reporting Service.You must specify the Data Analyzer repository details when you create a Reporting Service.

Data object cache repository:

The data object cache database stores cached logical data objects and virtual tables for the Data Integration Service. You specify the data object cache database connection when you create the Data Integration Service

Human task repository:

The Data Integration Service stores metadata for Human tasks in the Human task database. Before you create the Human task database, set up a database and database user account for the Model repository

You specify the Human task database connection when you create the Data Integration Service.

Jaspersoft repository:

The Jaspersoft repository stores reports, data sources, and metadata corresponding to the data source. You must specify the Jaspersoft repository details when you create the Reporting and Dashboards Service.

Metadata Manager Repository:

Metadata Manager repository contains the Metadata Manager warehouse and models. The Metadata Manager warehouse is a centralized metadata warehouse that stores the metadata from metadata sources. Specify the repository details when you create a Metadata Manager Service

Model repository:Informatica services and clients store data and metadata in the Model repository. Before you create the Model Repository Service, set up a database and database user account for the Model repository.

PowerCenter repository:

A PowerCenter repository is a collection of database tables containing metadata. A PowerCenter Repository Service manages the repository and performs all metadata transactions between the repository database and repository clients.

Profiling warehouse:The profiling warehouse database stores profiling and scorecard results. You specify the profiling warehouse connection when you create the Data Integration ServiceNote: Ensure that you install the database client on the machine on which you want to run the Data Integration Service.

Reference data warehouse:

The reference data warehouse stores the data values for reference table objects that you define in a Model repository. You configure a Content Management Service to identify the reference data warehouse and the Model repository.You associate a reference data warehouse with a single Model repository. You can select a common reference data warehouse on multiple Content Management Services if the Content Management Services identify a common Model repository. The reference data warehouse must support mixed-case column names.Note: Ensure that you install the database client on the machine on which you want to run the Content Management Service.

Service Manager Log FilesThe installer starts the Informatica service. The Informatica service starts the Service Manager for the node. The Service Manager generates log files that indicate the startup status of a node. Use these files to troubleshoot issues when the Informatica service fails to start and you cannot log in to Informatica Administrator. The Service Manager log files are created on each node.catalina.out:

Log events from the Java Virtual Machine (JVM) that runs the Service Manager. For example, a port is available during installation, but is in use when the Service Manager starts. Use this log to get more information about which port was unavailable during startup of the Service Manager. The catalina.out file is in the /tomcat/logs directory.

node.log:

Log events generated during the startup of the Service Manager on a node. You can use this log to get more information about why the Service Manager for a node failed to start. For example, if the Service Manager cannot connect to the domain configuration database after 30 seconds, the Service Manager fails to start. The node.log file is in the /tomcat/logs directory.

Configure Informatica Environment VariablesYou can configure Informatica environment variables to store memory, domain, and location settingsConfigure INFA_JAVA_OPTS as a system variable.

Informatica uses a maximum of 512 MB of system memory

-Xmx1024m

configure INFA_DOMAINS_FILE as a system variable

INFA_DOMAINS_FILE variable to the path and file name of the domains.infa file

Use INFA_HOME to designate the Informatica installation directory

If you enable secure communication for the domain, set the INFA_TRUSTSTORE variable with the directory that contains the truststore files for the SSL certificates

The directory must contain truststore files named infa_truststore.jks and infa_truststore.pem.

You must set the INFA_TRUSTSTORE variable if you use the default SSL certificate provided by Informatica or a certificate that you provide

The following table describes the database connections that you must create before you create the associated application services

Database Connection Description

Data object cache database To access the data object cache, create the data object cache connection for the Data Integration Service.

Human task database To store Human task metadata, create the human task database connection for the Data Integration Service.

Profiling warehouse database To create and run profiles and scorecards, create the profiling warehouse database connection for the Data Integration Service.To create and run profiles and scorecards, select this instance of the Data Integration Service when you configure the run-time properties of the Analyst Service.

Reference data warehouse To store reference data, create the reference data warehouse connection for the Content management service

Configuring IDQ:

Create 3 databases:

INFA_MRS - For Model Repository Database.INFA_PROWHS - Profiling warehouse databaseINFA_ANLSTG Analyst stage database

Created INFA_HUMAN user for Human Task database.Created INFA_SQL_PROP user for SQL Properties as part of Data Integration service.

INFA_REF/INFA_REF:Create database INFA_REF for Reference Database. After that need to create connection in Admin Console, need to create content management service.

Logon to Infa admin console

Create 6 connections to point to above databases

Create new model repository service use infa_mrs database It will create content and it may take some time.Create new data integration service Here we need to point to Model repository service that we have created in above step.

Below window, Gave Administrator /Administrator as user name/pwd.

Selected Human Task Service Module and Profiling Service Module. Did not select others.

Selected Human Task Service and Profiling Service. Did not select others.

Create new analyst service

Setting up IDQ Analyst Tool:

Logon to Infa 9 admin console using user ID and PasswordGo to Analyst ServiceYou will see URL for Analyst toolClick on the link and give user ID, password if asksFrom Actions menu, click new projectSelect Project and from Actions Menu, create New FolderClick on Folder, Now to import the file customer_OrgA csv file, click on the Actions Menu and New Flat fileImport the csv fileTo import table, click on the Actions Menu and New Table

http://WIN-A4ZOPLLNM64:8085/analyst/

http://WIN-A4ZOPLLNM64:8085/analyst/

Administrator/Administrator

Creating a profile in Informatica Analyst:

Creating reference table in Informatica Analyst:

Setting up Infa Developer:

PropertyDescription

User nameDatabase user name.

PasswordPassword for the user name.

Connection String for metadata accessConnection string to import physical data objects. Use the following connection string: jdbc:informatica:oracle://: 1521;SID=

Connection String for data accessConnection string to preview data and run mappings. Enterdbname.world from the TNSNAMES entry.

Code PageDatabase code page.

Environment SQLOptional. Enter SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the connection environment SQL each time it connects to the database.

Transaction SQLOptional. Enter SQL commands to set the database environment when you connect to the database. The Data Integration Service executes the transaction environment SQL at the beginning of each transaction.

Retry PeriodThis property is reserved for future use.

Parallel ModeOptional. Enables parallel processing when loading data into a table in bulk mode. Default is disabled.

SQL Identifier CharacterThe type of character used to identify special characters and reserved SQL keywords, such as WHERE. The Data Integration Service places the selected character around special characters and reserved SQL keywords. The Data Integration Service also uses this character for the Support Mixed-case Identifiers property.

Support Mixed-case IdentifiersWhen enabled, the Data Integration Service places identifier characters around table, view, schema, synonym, and column names when generating and executing SQL against these objects in the connection. Use if the objects have mixed-case or lowercase names. By default, this option is not selected.

Creating a ConnectionIn the Administrator tool, you can create relational database, social media, and file systems connections.

1. In the Administrator tool, click the Domain tab.2. Click the Connections view.3. In the Navigator, select the domain.4. In the Navigator, click Actions > New > Connection. The New Connection dialog box appears.5. In the New Connection dialog box, select the connection type, and then click OK. The New Connection wizard appears.6. Enter the connection properties.The connection properties that you enter depend on the connection type. Click Next to go to the next page of the New Connection wizard.7. When you finish entering connection properties, you can click Test Connection to test the connection.8. Click Finish.

Informatica contains the following components:

1. Application clients. A group of clients that you use to access underlying Informatica functionality. Application clients make requests to the Service Manager or application services.2. Application services. A group of services that represent server-based functionality. An Informatica domain can contain a subset of application services. You configure the application services that are required by the application clients that you use.3. Repositories. A group of relational databases that store metadata about objects and processes required to handle user requests from application clients.4. Service Manager. A service that is built in to the domain to manage all domain operations. The Service Manager runs the application services and performs domain functions including authentication, authorization, and logging.

Application ClientApplication ServicesRepositories

Data AnalyzerReporting ServiceData Analyzer repository

Informatica Reporting & DashboardsReporting and Dashboards ServiceJaspersoft repository

Informatica AnalystAnalyst ServiceData Integration ServiceModel Repository ServiceSearch ServiceModel repository

Informatica Data Director for Data QualityData Integration ServiceInformatica Data Director ServiceHuman task database

Informatica Developer Analyst Service Content Management Service Data Integration Service Model Repository ServiceModel repository

Metadata Manager Metadata Manager Service PowerCenter Integration Service PowerCenter Repository Service Metadata Manager repository PowerCenter repository

PowerCenter Client PowerCenter Integration Service PowerCenter Repository ServicePowerCenter repository

Web Services Hub Console PowerCenter Integration Service PowerCenter Repository Service Web Services HubPowerCenter repository

The following application services are not accessed by an Informatica application client:

PowerExchange Listener Service. Manages the PowerExchange Listener for bulk data movement and change data capture. The PowerCenter Integration Service connects to the PowerExchange Listener through the Listener Service.

PowerExchange Logger Service. Manages the PowerExchange Logger for Linux, UNIX, and Windows to capture change data and write it to the PowerExchange Logger Log files. Change data can originate from DB2 recovery logs, Oracle redo logs, a Microsoft SQL Server distribution database, or data sources on an i5/OS or z/OS system.

SAP BW Service. Listens for RFC requests from SAP BI and requests that the PowerCenter Integration Service run workflows to extract from or load to SAP BI.

RFC. Purpose. Communication between applications in different systems in theSAPenvironment includes connections betweenSAP systems as well as betweenSAPsystems and non-SAPsystems.

Remote Function Call(RFC) is the standardSAPinterface for communication betweenSAPsystems.

Feature AvailabilityInformatica products use a common set of applications. The product features you can use depend on your product license.

The following table describes the licensing options and the application features available with each option:

Licensing OptionInformatica Developer FeaturesInformatica Analyst Features

Data Explorer-Profiling that includes using the enterprise discovery profile and discovering primary key, foreign key, and functional dependency.-Curate inferred profile results-Scorecarding

-Profiling including enterprise discovery-Scorecarding-Use discovery search to find where data and metadata exist in the profiling repositories-Curate inferred profile results-Create and run profiling rules-Reference table management

Data Quality-Create and run mappings with all transformations-Create and run rules-Profiling-Scorecarding-Export objects to PowerCenter

-Profiling-Scorecarding-Reference table management-Create profiling rules-Run rules in profiles-Bad and duplicate record management

Data Services-Create logical data object models-Create and run mappings with Data Services transformations-Create SQL data services-Create web services-Export objects to PowerCenter

-Reference table management

Data Services and Profiling Option-Create logical data object models-Create and run mappings with Data Services transformations-Create SQL data services-Create web services-Export objects to PowerCenter-Create and run rules with Data Services transformations-Profiling

-Reference table management

Informatica AnalystUse to analyze, cleanse, standardize, profile, and score data in an enterpriseColumn and rule profiling, scorecarding, and bad record and duplicate record management,

You can also manage reference data and provide the data to developers in a data quality solution

Data Quality and Profiling

Profile data. Profiling reveals the content and structure of your data. Profiling is a key step in any data project as it can identify strengths and weaknesses in your data and help you define your project plan.

Create scorecards to review data quality. A scorecard is a graphical representation of the quality measurements in a profile. Standardize data values. Standardize data to remove errors and inconsistencies that you find when you run a profile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure that the city, state, and ZIP code values are consistent.

Parse records. Parse data records to improve record structure and derive additional information from your data. You can split a single field of freeform data into fields that contain different information types. You can also add information to your records. For example, you can flag customer records as personal or business customers.

Validate postal addresses. Address validation evaluates and enhances the accuracy and deliverability of your postal address data. Address validation corrects errors in addresses and completes partial addresses by comparing address records against reference data from national postal carriers. Address validation can also add postal information that speeds mail delivery and reduces mail costs.

Find duplicate records. Duplicate record analysis compares a set of records against each other to find similar or matching values in selected data columns. You set the level of similarity that indicates a good match between field values. You can also set the relative weight fixed to each column in match calculations. For example, you can prioritize surname information over forename information.

Create and run data quality rules. Informatica provides pre-built rules that you can run or edit to suit your project objectives. You can create rules in the Developer tool.

Collaborate with Informatica users. The rules and reference data tables you add to the Model repository are available to users in the Developer tool and the Analyst tool. Users can collaborate on projects, and different users can take ownership of objects at different stages of a project.

Export mappings to PowerCenter. You can export mappings to PowerCenter to reuse the metadata for physical data integration or to create web services.

Informatica Analyst TutorialCreates projects and folders, creates profiles and rules, scores data, and creates reference tables

Errors:

Mapping service associated with the Analyst service is disabled or is not available. Recycle the Mapping service in the Administrator tool.

Below module was set to false, now made it to true in admin console. And recycled Repository service, Data Int service and Analyst service.

No data domains in the data domain glossary. This error has come while creating Quick profile in Discovery workspace.

Tried creating reference table in Informatica Analyst tool:Got the error: cannot create reference table

Solution:https://mysupport.informatica.com/message/40554#40554Login to informatica administator console. click on the analyst service.go to action on right hand side. the click on audit table > create. Once the audit table is created the analyst service can create the reference table.

I could not find this option, I feel content management service is required to create reference tables from Analyst. So need to create Reference Data Warehouse and Content Management service.

Created Content management service. After that got the below error: Audit Tables do not exist.

Solution: Open Actions (Left side)