96
Informatica Data Quality (Version 9.5.1) User Guide

Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

  • Upload
    others

  • View
    23

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Informatica Data Quality (Version 9.5.1)

User Guide

Page 2: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Informatica Data Quality User Guide

Version 9.5.1December 2012

Copyright (c) 2009-2012 Informatica. All rights reserved.

This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use anddisclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form,by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or internationalPatents and other Patents Pending.

Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided inDFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013©(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.

The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us inwriting.

Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange,PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica OnDemand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging and InformaticaMaster Data Management are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other companyand product names may be trade names or trademarks of their respective owners.

Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rightsreserved. Copyright © Sun Microsystems. All rights reserved. Copyright © RSA Security Inc. All Rights Reserved. Copyright © Ordinal Technology Corp. All rightsreserved.Copyright © Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright © MetaIntegration Technology, Inc. All rights reserved. Copyright © Intalio. All rights reserved. Copyright © Oracle. All rights reserved. Copyright © Adobe Systems Incorporated. Allrights reserved. Copyright © DataArt, Inc. All rights reserved. Copyright © ComponentSource. All rights reserved. Copyright © Microsoft Corporation. All rights reserved.Copyright © Rogue Wave Software, Inc. All rights reserved. Copyright © Teradata Corporation. All rights reserved. Copyright © Yahoo! Inc. All rights reserved. Copyright ©Glyph & Cog, LLC. All rights reserved. Copyright © Thinkmap, Inc. All rights reserved. Copyright © Clearpace Software Limited. All rights reserved. Copyright © InformationBuilders, Inc. All rights reserved. Copyright © OSS Nokalva, Inc. All rights reserved. Copyright Edifecs, Inc. All rights reserved. Copyright Cleo Communications, Inc. All rightsreserved. Copyright © International Organization for Standardization 1986. All rights reserved. Copyright © ej-technologies GmbH. All rights reserved. Copyright © JaspersoftCorporation. All rights reserved. Copyright © is International Business Machines Corporation. All rights reserved. Copyright © yWorks GmbH. All rights reserved. Copyright ©Lucent Technologies. All rights reserved. Copyright (c) University of Toronto. All rights reserved. Copyright © Daniel Veillard. All rights reserved. Copyright © Unicode, Inc.Copyright IBM Corp. All rights reserved. Copyright © MicroQuill Software Publishing, Inc. All rights reserved. Copyright © PassMark Software Pty Ltd. All rights reserved.Copyright © LogiXML, Inc. All rights reserved. Copyright © 2003-2010 Lorenzi Davide, All rights reserved. Copyright © Red Hat, Inc. All rights reserved. Copyright © The Boardof Trustees of the Leland Stanford Junior University. All rights reserved. Copyright © EMC Corporation. All rights reserved. Copyright © Flexera Software. All rights reserved.

This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and other software which is licensed under the Apache License,Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing,software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See theLicense for the specific language governing permissions and limitations under the License.

This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright ©1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be found at http://www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but notlimited to the implied warranties of merchantability and fitness for a particular purpose.

The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine,and Vanderbilt University, Copyright (©) 1993-2006, all rights reserved.

This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution ofthis software is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html.

This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <[email protected]>. All Rights Reserved. Permissions and limitations regarding thissoftware are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or withoutfee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

The product includes software copyright 2001-2005 (©) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms availableat http://www.dom4j.org/ license.html.

The product includes software copyright © 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to termsavailable at http://dojotoolkit.org/license.

This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding thissoftware are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.

This product includes software copyright © 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://www.gnu.org/software/ kawa/Software-License.html.

This product includes OSSP UUID software which is Copyright © 2002 Ralf S. Engelschall, Copyright © 2002 The OSSP Project Copyright © 2002 Cable & WirelessDeutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.

This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subjectto terms available at http:/ /www.boost.org/LICENSE_1_0.txt.

This product includes software copyright © 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://www.pcre.org/license.txt.

This product includes software copyright © 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to termsavailable at http:// www.eclipse.org/org/documents/epl-v10.php.

This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/doc/ license.html, http://www.asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.org,http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/license-agreements/fuse-message-broker-v-5-3- license-agreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html; http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/jsch/LICENSE.txt. http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231; http://www.slf4j.org/license.html; http://developer.apple.com/library/mac/#samplecode/HelpHook/Listings/HelpHook_java.html; http://nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http://forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/

Page 3: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

software/tcltk/license.html, http://www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, http://www.slf4j.org/license.html; http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/License; http://www.keplerproject.org/md5/license.html; http://www.toedter.com/en/jcalendar/license.html; http://www.edankert.com/bounce/index.html; http://www.net-snmp.org/about/license.html; http://www.openmdx.org/#FAQ; http://www.php.net/license/3_01.txt; http://srp.stanford.edu/license.txt; http://www.schneier.com/blowfish.html;http://www.jmock.org/license.html; http://xsom.java.net; and http://benalman.com/about/license/.

This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and DistributionLicense (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code LicenseAgreement Supplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php) the MIT License (http://www.opensource.org/licenses/mit-license.php) and the Artistic License (http://www.opensource.org/licenses/artistic-license-1.0).

This product includes software copyright © 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this softwareare subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For furtherinformation please visit http://www.extreme.indiana.edu/.

This product includes software developed by Andrew Kachites McCallum. "MALLET: A Machine Learning for Language Toolkit." http://mallet.cs.umass.edu (2002).

This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178; 6,032,158; 6,035,307; 6,044,374; 6,092,086; 6,208,990; 6,339,775;6,640,226; 6,789,096; 6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,243,110, 7,254,590; 7,281,001; 7,421,458; 7,496,588; 7,523,121; 7,584,422;7676516; 7,720,842; 7,721,270; and 7,774,791, international Patents and other Patents Pending.

DISCLAIMER: Informatica Corporation provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the impliedwarranties of noninfringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. Theinformation provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation issubject to change at any time without notice.

NOTICES

This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress SoftwareCorporation ("DataDirect") which are subject to the following terms and conditions:

1.THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOTLIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.

2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT,INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OFTHE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACHOF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.

Part Number: DQ-UG-95100-0001

Page 4: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Table of Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viInformatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Informatica Customer Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Informatica Multimedia Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Part I: Informatica Data Quality Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Chapter 1: Introduction to Data Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Data Quality Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Chapter 2: Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Reference Data Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

User-Defined Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Informatica Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Reference Data and Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Reference Table Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Managed and Unmanaged Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Content Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Character Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Classifier Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Pattern Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Probabilistic Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Regular Expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Token Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Creating a Content Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Creating a Reusable Content Expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Part II: Data Quality Features in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Chapter 3: Column Profiles in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . 19Column Profile Concepts Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Column Profile Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Table of Contents i

Page 5: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Scorecards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Column Profiles in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Filtering Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Sampling Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Creating a Single Data Object Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Chapter 4: Column Profile Results in Informatica Developer. . . . . . . . . . . . . . . . . . . 23Column Profile Results in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Column Value Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Column Pattern Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Column Statistics Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Exporting Profile Results from Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Chapter 5: Rules in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26Rules in Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Creating a Rule in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Applying a Rule in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Chapter 6: Scorecards in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Scorecards in Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Creating a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Chapter 7: Mapplet and Mapping Profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Mapplet and Mapping Profiling Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Running a Profile on a Mapplet or Mapping Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Comparing Profiles for Mapping or Mapplet Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Generating a Mapping from a Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Chapter 8: Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Reference Tables Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Reference Table Data Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Creating a Reference Table Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Creating a Reference Table from a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Creating a Reference Table from a Relational Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Copying a Reference Table in the Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Part III: Data Quality Features in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Chapter 9: Column Profiles in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . 38Column Profiles in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Column Profiling Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Profile Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Profile Results Option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

ii Table of Contents

Page 6: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Sampling Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Drilldown Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Creating a Column Profile in the Analyst Tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Editing a Column Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Running a Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Creating a Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Managing Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Synchronizing a Flat File Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Synchronizing a Relational Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Chapter 10: Column Profile Results in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . 45Column Profile Results in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Profile Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Column Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Column Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Column Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Column Profile Drilldown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Drilling Down on Row Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Applying Filters to Drilldown Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Column Profile Export Files in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Profile Export Results in a CSV File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Profile Export Results in Microsoft Excel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Exporting Profile Results from Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Chapter 11: Rules in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Rules in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Predefined Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Predefined Rules Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Applying a Predefined Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Expression Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Expression Rules Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Creating an Expression Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Chapter 12: Scorecards in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Scorecards in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Informatica Analyst Scorecard Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Metric Weights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Adding Columns to a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Running a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Viewing a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Editing a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Defining Thresholds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Table of Contents iii

Page 7: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Metric Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Drilling Down on Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Viewing Trend Charts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Notification Email Message Template. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Setting Up Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Configuring Global Settings for Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Scorecard Integration with External Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Viewing a Scorecard in External Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Chapter 13: Exception Record Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Exception Record Management Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Exception Management Process Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Reserved Column Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Exception Management Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Viewing and Editing Bad Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Updating Bad Record Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Viewing and Filtering Duplicate Record Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Editing Duplicate Record Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Consolidating Duplicate Record Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Viewing the Audit Trail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Chapter 14: Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Reference Tables Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Reference Table Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

General Reference Table Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Reference Table Column Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Create Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Creating a Reference Table in the Reference Table Editor. . . . . . . . . . . . . . . . . . . . . . . . . 73

Create a Reference Table from Profile Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Creating a Reference Table from Profile Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Creating a Reference Table from Column Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Creating a Reference Table from Column Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Create a Reference Table From a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Analyst Tool Flat File Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

Creating a Reference Table from a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Create a Reference Table from a Database Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Creating a Database Connection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Creating a Reference Table from a Database Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Copying a Reference Table in the Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Reference Table Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Managing Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Managing Rows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

iv Table of Contents

Page 8: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Finding and Replacing Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Exporting a Reference Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Audit Trail Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Viewing Audit Trail Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Rules and Guidelines for Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Table of Contents v

Page 9: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

PrefaceThe Informatica Data Quality User Guide is written for Informatica users who create and run data qualityprocesses in the Informatica Developer and Informatica Analyst client applications. The Informatica Data QualityUser Guide contains information about profiles and other objects that you can use to analyze the content andstructure of data and to find and fix data quality issues.

Informatica Resources

Informatica Customer PortalAs an Informatica customer, you can access the Informatica Customer Portal site at http://mysupport.informatica.com. The site contains product information, user group information, newsletters,access to the Informatica customer support case management system (ATLAS), the Informatica How-To Library,the Informatica Knowledge Base, the Informatica Multimedia Knowledge Base, Informatica ProductDocumentation, and access to the Informatica user community.

Informatica DocumentationThe Informatica Documentation team takes every effort to create accurate, usable documentation. If you havequestions, comments, or ideas about this documentation, contact the Informatica Documentation team throughemail at [email protected]. We will use your feedback to improve our documentation. Let usknow if we can contact you regarding your comments.

The Documentation team updates documentation as needed. To get the latest documentation for your product,navigate to Product Documentation from http://mysupport.informatica.com.

Informatica Web SiteYou can access the Informatica corporate web site at http://www.informatica.com. The site contains informationabout Informatica, its background, upcoming events, and sales offices. You will also find product and partnerinformation. The services area of the site includes important information about technical support, training andeducation, and implementation services.

Informatica How-To LibraryAs an Informatica customer, you can access the Informatica How-To Library at http://mysupport.informatica.com.The How-To Library is a collection of resources to help you learn more about Informatica products and features. Itincludes articles and interactive demonstrations that provide solutions to common problems, compare features andbehaviors, and guide you through performing specific real-world tasks.

vi

Page 10: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Informatica Knowledge BaseAs an Informatica customer, you can access the Informatica Knowledge Base at http://mysupport.informatica.com.Use the Knowledge Base to search for documented solutions to known technical issues about Informaticaproducts. You can also find answers to frequently asked questions, technical white papers, and technical tips. Ifyou have questions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Baseteam through email at [email protected].

Informatica Multimedia Knowledge BaseAs an Informatica customer, you can access the Informatica Multimedia Knowledge Base at http://mysupport.informatica.com. The Multimedia Knowledge Base is a collection of instructional multimedia filesthat help you learn about common concepts and guide you through performing specific tasks. If you havequestions, comments, or ideas about the Multimedia Knowledge Base, contact the Informatica Knowledge Baseteam through email at [email protected].

Informatica Global Customer SupportYou can contact a Customer Support Center by telephone or through the Online Support. Online Support requiresa user name and password. You can request a user name and password at http://mysupport.informatica.com.

Use the following telephone numbers to contact Informatica Global Customer Support:

North America / South America Europe / Middle East / Africa Asia / Australia

Toll FreeBrazil: 0800 891 0202Mexico: 001 888 209 8853North America: +1 877 463 2435

Toll FreeFrance: 0805 804632Germany: 0800 5891281Italy: 800 915 985Netherlands: 0800 2300001Portugal: 800 208 360Spain: 900 813 166Switzerland: 0800 463 200United Kingdom: 0800 023 4632

Standard RateBelgium: +31 30 6022 797France: +33 1 4138 9226Germany: +49 1805 702 702Netherlands: +31 306 022 797United Kingdom: +44 1628 511445

Toll FreeAustralia: 1 800 151 830New Zealand: 09 9 128 901

Standard RateIndia: +91 80 4112 5738

Preface vii

Page 11: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

viii

Page 12: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Part I: Informatica Data QualityConcepts

This part contains the following chapters:

¨ Introduction to Data Quality, 2

¨ Reference Data, 4

1

Page 13: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 1

Introduction to Data QualityThis chapter includes the following topic:

¨ Data Quality Overview, 2

Data Quality OverviewUse Informatica Data Quality to analyze the content and structure of your data and enhance the data in ways thatmeet your business needs.

You use Informatica applications to design and run processes to complete the following tasks:

¨ Profile data. Profiling reveals the content and structure of data. Profiling is a key step in any data project, as itcan identify strengths and weaknesses in data and help you define a project plan.

¨ Create scorecards to review data quality. A scorecard is a graphical representation of the qualitymeasurements in a profile.

¨ Standardize data values. Standardize data to remove errors and inconsistencies that you find when you run aprofile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensurethat the city, state, and ZIP code values are consistent.

¨ Parse data. Parsing reads a field composed of multiple values and creates a field for each value according tothe type of information it contains. Parsing can also add information to records. For example, you can define aparsing operation to add units of measurement to product data.

¨ Validate postal addresses. Address validation evaluates and enhances the accuracy and deliverability of postaladdress data. Address validation corrects errors in addresses and completes partial addresses by comparingaddress records against address reference data from national postal carriers. Address validation can also addpostal information that speeds mail delivery and reduces mail costs.

¨ Find duplicate records. Duplicate analysis calculates the degrees of similarity between records by comparingdata from one or more fields in each record. You select the fields to be analyzed, and you select thecomparison strategies to apply to the data. The Developer tool enables two types of duplicate analysis: fieldmatching, which identifies similar or duplicate records, and identity matching, which identifies similar orduplicate identities in record data.

¨ Manage exceptions. An exception is a record that contains data quality issues that you correct by hand. Youcan run a mapping to capture any exception record that remains in a data set after you run other data qualityprocesses. You review and edit exception records in the Analyst tool or in Informatica Data Director for DataQuality.

¨ Create reference data tables. Informatica provides reference data that can enhance several types of dataquality process, including standardization and parsing. You can create reference tables using data from profileresults.

2

Page 14: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

¨ Create and run data quality rules. Informatica provides rules that you can run or edit to meet your projectobjectives. You can create mapplets and validate them as rules in the Developer tool.

¨ Collaborate with Informatica users. The Model repository stores reference data and rules, and this repository isavailable to users of the Developer tool and Analyst tool. Users can collaborate on projects, and different userscan take ownership of objects at different stages of a project.

¨ Export mappings to PowerCenter. You can export and run mappings in PowerCenter. You can export mappingsto PowerCenter to reuse the metadata for physical data integration or to create web services.

Data Quality Overview 3

Page 15: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 2

Reference DataThis chapter includes the following topics:

¨ Reference Data Overview, 4

¨ User-Defined Reference Data, 5

¨ Informatica Reference Data, 6

¨ Reference Data and Transformations, 6

¨ Reference Tables, 7

¨ Content Sets, 8

Reference Data OverviewA reference data object contains a set of data values that you perform search operations in source data. You cancreate reference data objects in the Developer tool and Analyst tool, and you can import reference data objects tothe Model repository. The Data Quality Content installer includes reference data objects that you can import.

You can create and edit the following types of reference data:

Reference tables

A reference table contains standard and alternative versions of a set of data values. You add a referencetable to a transformation in the Developer tool to verify that source data values are accurate and correctlyformatted.

A database table contains at least two columns. One column contains the standard or preferred version of astring, and other columns contain alternative versions. When you add a reference table to a transformation,the transformation searches the input port data for values that also appear in the table. You can create tableswith any data that is useful to the data project you work on.

Content Sets

Content sets are repository and file objects that contain reference data values. Content sets are similar instructure to reference tables but they are more commonly used for lower-level There are different types ofcontent sets. When you add a content set to a transformation, the transformation searches the input port datafor values that appear in the content or for strings that match the data patterns defined in the content set.

The Data Quality Content installer includes reference data objects that you can import. You download the DataQuality Content Installer from Informatica.

The Data Quality Content installer includes the following types of reference data:

4

Page 16: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Informatica reference tables

Database tables created by Informatica. You import Informatica reference tables when you import acceleratorobjects from the Content Installer. The reference tables contain standard and alternative versions of commonbusiness terms from several countries. The types of reference information include telephone area codes,postcode formats, first names, Social Security number formats, occupations, and acronyms. You can editInformatica reference tables.

Informatica content sets

Content sets created by Informatica. You import content sets when you import accelerator objects from theContent Installer. A content set contains different types of reference data that you can use to perform searchoperations in data quality transformations.

Address reference data files

Reference data files that identify all valid addresses in a country. The Address Validator transformation readsthis data. You cannot create or edit address reference data files.

The Content Installer installs files for the countries that you have purchased. Address reference data iscurrent for a defined period and you must refresh your data regularly, for example every quarter. You cannotview or edit address reference data.

Identity population files

Contain information on types of personal, household, and corporate identities. The Match transformation andthe Comparison transformation use this data to parse potential identities from input fields. You cannot createor edit address identity population files.

The Content Installer writes population files to the file system.

User-Defined Reference DataYou can use the values in a data object to create a reference data object.

For example, you can select a data object or profile column that contains values that are specific to a project ororganization. The column values let you create custom reference data objects for a project.

You can build a reference data object from a data column in the following cases:

¨ The data rows in the column contain the same type of information.

¨ The column contains a set of data values that are either correct or incorrect for the project.

Note: Create a reference object with incorrect values when you want to search a data set for incorrect values.

The following table lists common examples of project data columns that can contain reference data:

Information Reference Data Example

Stock Keeping Unit (SKU) codes Use an SKU column to create a reference table of valid SKUcode for an organization. Use the reference table to findcorrect or incorrect SKU codes in a data set.

Employee codes Use an employee code or employee ID column to create areference table of valid employee codes. Use the referencetable to find errors in employee data.

User-Defined Reference Data 5

Page 17: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Information Reference Data Example

Customer account numbers Run a profile on a customer account column to identifyaccount number patterns. Use the profile to create a token setof incorrect data patterns. Use the token set to find accountnumbers that do not conform to the correct account numberstructure.

Customer names When a customer name column contains first, middle, andlast names, you can create a probabilistic model that definesthe expected structure of the strings in the column. Use theprobabilistic model to find data strings that do not belong inthe column.

Informatica Reference DataYou purchase and download address reference data and identity population data from Informatica. You purchasean annual subscription to address data for a country, and you can download the latest address data fromInformatica at any time during the subscription period.

The Content Installer user downloads and installs reference data separately from the applications. Contact anAdministrator tool user for information about the reference data installed on your system

Reference Data and TransformationsSeveral transformations read reference data to perform data quality tasks.

The following transformations can read reference data:

¨ Address Validator. Reads address reference data to verify the accuracy of addresses.

¨ Case Converter. Reads reference data tables to identify strings that must change case.

¨ Classifier. Reads content set data to identify the type of information in a string.

¨ Comparison. Reads identity population data during duplicate analysis.

¨ Labeler. Reads content set data to identify and label strings.

¨ Match. Reads identity population data during duplicate analysis.

¨ Parser. Reads content set data to parse strings based on the information the contain.

¨ Standardizer. Reads reference data tables to standardize strings to a common format.

You can create reference data objects in the Developer tool and Analyst tool. For example, you can create areference table from column profile data. You can export reference tables to the file system.

The Data Quality Content Installer file set includes Informatica reference data objects that you can import.

6 Chapter 2: Reference Data

Page 18: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Reference TablesA reference table contains the standard versions of a set of data values and any alternative version of the valuesthat you may want to find. You add reference tables to transformations in the Developer tool.

You create reference tables in the following ways:

¨ Create a reference table object and enter data values.

¨ Create a reference table from column profile results.

¨ Create a reference table from data in a flat file.

¨ Create a reference table from data in another database table.

When you create a reference table, the Model repository stores the table metadata. The staging database oranother database stores the column data values. After you create a reference table, you can add and editcolumns, rows, and data values. You can also search and replace values in reference table rows.

Reference Table StructureMost reference tables contain at least two columns. One column contains the correct or required versions of thedata values. Other columns contain different versions of the values, including alternative versions that may appearin the source data.

The column that contains the correct or required values is called the valid column. When a transformation reads areference table in a mapping, the transformation looks for values in the non-valid columns. When thetransformation finds a non-valid value, it returns the corresponding value from the valid column. You can alsoconfigure a transformation to return a single common value instead of the valid values.

The valid column can contain data that is formally correct, such as ZIP codes. It can contain data that is relevantto a project, such as stock keeping unit (SKU) numbers that are unique to an organization. You can also create avalid column from bad data, such as values that contain known data errors that you want to search for.

For example, a Developer tool user creates a reference table that contains a list of valid SKU numbers in a retailorganization. The user adds the reference table to a Labeler transformation and creates a mapping with thetransformation. The user runs the mapping on a product database table. When the mapping runs, the Labelercreates a column that identifies the product records that do not contain valid SKU numbers.

Reference Tables and the Parser TransformationYou create a reference table with a single column when you want to use the table data in a pattern-based parsingoperation. You configure the Parser transformation to perform pattern-based parsing, and you import the data tothe transformation configuration.

Managed and Unmanaged Reference TablesReference tables store metadata in the Model repository. Reference tables can store column data in the referencedata database or in another database. The Content Management Service stores the database connection for thereference data database.

A managed reference table stores column data in the reference data database. You can edit the values of amanaged table in the Analyst tool and Developer tool.

An unmanaged reference table stores column data in a database other than the reference data database. Youcannot edit the values of an unmanaged table in the Analyst tool or Developer tool.

Reference Tables 7

Page 19: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Content SetsA content set is a Model repository object that you use to store reusable content expressions. A contentexpression is an expression that you can use in Labeler and Parser transformations to identify data.

You can create content sets to organize content expressions into logical groups. For example, if you create anumber of content expressions that identify Portuguese strings, you can create a content set that groups thesecontent expressions. Create content sets in the Developer tool.

Content expressions include character sets, pattern sets, regular expressions, and token sets. Contentexpressions can be system-defined or user-defined. System-defined content expressions cannot be added tocontent sets. User-defined content expressions can be reusable or non-reusable.

Character SetsA character set contains expressions that identify specific characters and character ranges. You can use charactersets in Labeler transformations that use character labeling mode.

Character ranges specify a sequential range of character codes. For example, the character range "[A-C]"matches the uppercase characters "A," "B," and "C." This character range does not match the lowercasecharacters "a," "b," or "c."

Use character sets to identify a specific character or range of characters as part of labeling operations. Forexample, you can label all numerals in a column that contains telephone numbers. After labeling the numbers, youcan identify patterns with a Parser transformation and write problematic patterns to separate output ports.

Character Set PropertiesConfigure properties that determine character labeling operations for a character set.

The following table describes the properties for a user-defined character set:

Property Description

Label Defines the label that a Labeler transformation applies to datathat matches the character set.

Standard Mode Enables a simple editing view that includes fields for the startrange and end range.

Start Range Specifies the first character in a character range.

End Range Specifies the last character in a character range. For a rangewith a single character, leave this field blank.

Advanced Mode Enables an advanced editing view where you can manuallyenter character ranges using range characters and delimitercharacters.

Range Character Temporarily changes the symbol that signifies a characterrange. The range character reverts to the default characterwhen you close the character set.

Delimiter Character Temporarily changes the symbol that separates characterranges. The delimiter character reverts to the defaultcharacter when you close the character set.

8 Chapter 2: Reference Data

Page 20: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Classifier ModelsA classifier model analyzes input strings and determines the types of information they contain. You use a classifiermodel in a Classifier transformation.

You can use a classifier model when input strings contain significant amounts of data. For example, you can use aclassifier model and Classifier transformation to identify the types of information in a set of documents. You exportthe text from each document, and you store the text of each document as a separate field in a single data column.The Classifier transformation reads the data and classifies the information in each field according to the labelsdefined in the model.

The classifier model contains the following columns:

¨ A column that contains the words and phrases that may exist in the input data. The transformation comparesthe input data with the data in this column.

¨ A column that contains descriptive labels that may define the information in the data. The transformationreturns a label from this column as output.

The classifier model also contains logic that the Classifier transformation uses to calculate the correct informationtype for the input data.

The Model repository stores the metadata for the classifier model object. The column data and logic is stored in afile in the Informatica installation directory structure.

Note: You cannot create or edit a classifier model in the Developer tool.

Classifier Models and the Core AcceleratorInformatica includes a classifier model in the set of prebuilt mappings and reference data objects called the CoreAccelerator. The Core Accelerator is part of the Informatica Data Quality product. You download the CoreAccelerator from Informatica with the Data Quality Content Installer.

When you download the Data Quality Content Installer, find the Core Accelerator xml file in the Content Installerfile set. Use the Developer tool to import the accelerator objects. The import operation writes the model object tothe Model repository and the model data file to the Informatica file system.

Pattern SetsA pattern set contains expressions that identify data patterns in the output of a token labeling operation. You canuse pattern sets to analyze the Tokenized Data output port and write matching strings to one or more output ports.Use pattern sets in Parser transformations that use pattern parsing mode.

For example, you can configure a Parser transformation to use pattern sets that identify names and initials. Thistransformation uses the pattern sets to analyze the output of a Labler transformation in token labeling mode. Youcan configure the Parser transformation to write names and initials in the output to separate ports.

Pattern Set PropertiesConfigure properties that determine the patterns in a pattern set.

The following table describes the property for a user-defined pattern set:

Property Description

Pattern Defines the patterns that the pattern parser searches for. Youcan enter multiple patterns for one pattern set. You can enter

Content Sets 9

Page 21: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Property Description

patterns constructed from a combination of wildcards,characters, and strings.

Probabilistic ModelsA probabilistic model identifies tokens by the types of information they contain and by their positions in an inputstring.

You use probabilistic models with the Labeler and Parser transformations. Select a probabilistic model when youwant to label or parse values on an input port into separate output ports.

A probabilistic model uses a structured set of tokens as a reference data set. A labeling or parsing operation canuse a probabilistic model to answer the following questions about the data that it reads on a port:

¨ Does the port data contain a token that matches the reference data in the model?

¨ What type of information does the token contain?

A probabilistic model contains the following columns:

¨ An input column that represents the data on the input port. You populate the column with sample data from theinput port. The model uses the sample data as reference data in parsing and labeling operations.

¨ One or more label columns that identify the types of information in each input string.

You add the columns to the model, and you assign labels to the tokens in each string. Use the label columns toindicate the correct position of the tokens in the string.

10 Chapter 2: Reference Data

Page 22: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

The following figure shows a probabilistic model in the Developer tool:

When you configure a token labeling operation with a probabilistic model, the Labeler transformation writes thecolumn name from the probabilistic model to an output port on the transformation. For example, the Labeler canuse a probabilistic model to label the string "Franklin Delano Roosevelt" as "FIRSTNAME MIDDLENAMELASTNAME."

When you configure a token parsing operation with a probabilistic model, each column you add to the modelbecomes an output port on the Parser transformation. The transformation writes each token to an output portbased on its position in the model.

Probabilistic LogicProbabilistic models behave differently to other types of content set.

Data Quality can infer a match between the input port data values and the model data values even if the port datais not listed in the model. This means that a probabilistic model does not need to list every token in a data set tocorrectly label or parse the tokens in the data set.

Data Quality uses probabilistic or fuzzy logic to identify tokens on the transformation input port that match tokensin the probabilistic model. The engine updates the fuzzy logic rules when you compile the probabilistic model.

Content Sets 11

Page 23: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Probabilistic Model Advanced PropertiesThe Advanced Properties dialog box exposes the computational properties that are built into a probabilistic modelwhen you compile the model.

The basic element in the compilation of probabilistic models is the n-gram. An n-gram is a series of letters that canbe followed or preceded by one or more letters to complete a word. Probabilistic analysis creates n-grams for eachvalue in the Input column of the probabilistic model. The analysis adds one or more letters to each n-gram tocreate different words. If the probabilistic analysis can create a word that matches a value on a Labeler or Parsertransformation input port, then the analysis determines that the Input value in the probabilistic model matches theinput value on the transformation port.

The advanced properties on a probabilistic model determine how the probabilistic model handles n-grams andother model features.

Note: The default property values represent the preferred settings for probabilistic analysis and probabilistic modelcompilation in Informatica. If you edit an advanced property, you may adversely affect the accuracy of theprobabilistic analysis. Do not edit the advanced properties unless you understand the effects of the changes youmake.

Steps to Create a Probabilistic ModelYou create a probabilistic model in multiple stages. Complete the tasks associated with each stage to create andconfigure a model that you can use in a transformation.

Complete the following tasks:

Create the probabilistic model object in the repository

You can use a data object to create the model, or you can create an empty model.

Assign labels to the input data

If the probabilistic model does not contain labels for the input data values, you must assign the labels.

Compile the probabilistic model

When you have entered the input data and configured the labels, you compile the model. You compile everytime you edit the model.

Creating an Empty Probabilistic ModelYou can use a data object as the source for the data in a probabilistic model, or you can create an empty model.

Create an empty probabilistic model when you want to enter the reference data at a later time.

Complete the following steps to create an empty probabilistic model:

1. In Object Explorer, open or create a content set.

2. Select the Content view.

3. Select Probabilistic Models, and click Add.

The Probabilistic Model wizard opens.

4. Select the Probabilistic Model option.

Click Next.

5. Enter a name for the model.

Click Finish and save the model.

The probabilistic model opens in the Developer tool.

After you create the empty model, you must add input data.

12 Chapter 2: Reference Data

Page 24: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Creating a Probabilistic Model from a Data ObjectYou can use a data object as the source for the data in a probabilistic model. For example, use the source dataobject from the mapping that will read the probabilistic model. You can also profile an object in the mapping andcreate a data object from the profile results.

Probabilistic model logic works best when you use data from the input port on the transformation to populate theinput and label columns in the model.

Complete the following steps to create a probabilistic model from a data object:

1. In Object Explorer, open or create a content set.

2. Select the Content view.

3. Select Probabilistic Models, and click Add.

The Probabilistic Model wizard opens.

4. Select the Probabilistic Model from Data Objects option.

Click Next.

5. Enter a name for the model, and browse to the data object you want to use.

Click Next.

6. Review the available data columns on the data object, and select a column to add as input data or label datato the model.

¨ To add a data source column to the Input column in the model, select the column name and click Data > .

¨ To use a data source column as a label source for the model, select the column name and click Label > .

Click Next.

7. Select the number of rows to copy from the data source. Select all rows, or enter the number of rows to copy.If you enter a number, the model counts the rows from the start of the data set.

8. Set the delimiters to use for the Input column and Data columns. The delimiters apply when the columnscontain multiple tokens.

The default delimiter is \s, which represents a character space.

9. Enter a name for a column to contain any token that the labeling or parsing operation cannot recognize.

The default name is O, which stands for Overflow.

10. Click Finish and save the model.

The probabilistic model opens in the Developer tool.

11. Click Compile to build the probabilistic logic rules for the model.

Assigning Labels to Probabilistic Model DataIf the data object you use to create the probabilistic model does not contain columns for label data, you must addthe data.

A label is a column name in the probabilistic model. The model uses the column name to identify different types ofinformation in the input data. You create the label columns, and you assign a label to each token in each inputrow. When you assign a label to a token, the model adds the token to the label column.

Follow these guidelines when you assign labels to input data:

¨ A label identifies the type of information that the token represents. A token may represent multiple types ofinformation if it appears in multiple locations in the input string. For example, you can assign the labelsFIRSTNAME LASTNAME to the names "John Blake" and "Blake Smith."

¨ You must assign a label to every token in every row, even if the tokens repeat in multiple rows.

Content Sets 13

Page 25: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Complete the following steps to assign labels to input data:

1. Open the probabilistic model in the Developer tool canvas.

2. Verify that the model contains the input data and label columns that you need.

a. To add a row of input data, click New. The cursor moves to the first available row in the input datacolumn. Enter the input data values.

b. To add a label column, right-click an input data row and select New Label. Enter a column name in theNew Label dialog box.

The label appears in the model.

3. Right-click an input data row and select View tokens and labels as rows.

The Labels panel displays under the input data column.

Note: A label is a structural element in a probabilistic model. If you add or remove a label in a probabilistic modelafter you add the model to a Parser transformation, you invalidate the parsing operation that uses the model. Youmust delete and recreate the operation that uses the probabilistic model if you add or remove a label in the model.

Compiling the Probabilistic ModelEach time you add data to a probabilistic model, you must compile the model. This enhances the matching logic inthe Data Quality engine.

u To update the fuzzy logic that the engine uses for a probabilistic model, open the model and click Compile.

Generating Probabilistic Model Data from a Midstream ProfileYou can run a profile on mapping data to create a data source for a probabilistic model. For example, run a profileon the transformation that you connect to the Labeler or Parser transformation, and populate the model with theprofile data. This ensure that the model data is as close as possible to the data on the input port you select in theLabeler or Parser transformation.

Complete the following steps to run a midstream mapping profile and generate input data for a probabilistic model:

1. Open the mapping that contains the transformation you will connect to the Labeler or Parser.

2. Select a data object and click Profile Now.

Select the Results tab in the profile, and review the profile results.

3. Under Column Profiling, select the column you want to add to the probabilistic model.

4. Under Details, select the option to Show Values.

The editor displays the data values in the column you selected.

Note: You can select all values in the column or a subset of values.

5. If you want to add a subset of column values to a probabilistic model, follow these steps:

a. Use the Shift or Ctrl keys to select one or multiple values from the editor.

b. Right-click the values and select Send to > Export Results to File.

6. If you want to add all column values to a probabilistic model, click the option to Export Value Frequencies toFile.

7. In the Export dialog box, enter a file name. You can save the file on the Informatica services machine or onthe Developer client machine.

If you save the file on the client machine, enter a path to the file.

You can use the file as a data source for the Label or Data column in the probabilistic model.

14 Chapter 2: Reference Data

Page 26: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Regular ExpressionsIn the context of content sets, a regular expression is an expression that you can use in parsing and labelingoperations. Use regular expressions to identify one or more strings in input data. You can use regular expressionsin Parser transformations that use token parsing mode. You can also use regular expressions in Labelertransformations that use token labeling mode.

Parser transformations use regular expressions to match patterns in input data and parse all matching strings toone or more outputs. For example, you can use a regular expression to identify all email addresses in input dataand parse each email address component to a different output.

Labeler transformations use regular expressions to match an input pattern and create a single label. Regularexpressions that have multiple outputs do not generate multiple labels.

Regular Expression PropertiesConfigure properties that determine how a regular expression identifies and writes output strings.

The following table describes the properties for a user-defined regular expression:

Property Description

Number of Outputs Defines the number of output ports that the regularexpression writes.

Regular Expression Defines a pattern that the Parser transformation uses tomatch strings.

Test Expression Contains data that you enter to test the regular expression. Asyou type data in this field, the field highlights strings thatmatches the regular expression.

Next Expression Moves to the next string that matches the regular expressionand changes the font of that string to bold.

Previous Expression Moves to the previous string that matches the regularexpression and changes the font of that string to bold.

Token SetsA token set contains expressions that identify specific tokens. You can use token sets in Labeler transformationsthat use token labeling mode. You can also use token sets in Parser transformations that use token parsing mode.

Use token sets to identify specific tokens as part of labeling and parsing operations. For example, you can use atoken set to label all email addresses that use that use an "AccountName@DomainName" format. After labelingthe tokens, you can use the Parser transformation to write email addresses to output ports that you specify.

Content Sets 15

Page 27: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Token Set PropertiesConfigure properties that determine the labeling operations for a token set.

The following table describes the properties for a user-defined character set:

Property Token Set Mode Description

Name N/A Defines the name of the token set.

Description N/A Describes the token set.

Token Set Options N/A Defines whether the token setuses regular expression mode orcharacter mode.

Label Regular Expression Defines the label that a Labelertransformation applies to datathat matches the token set.

Regular Expression Regular Expression Defines a pattern that the Labelertransformation uses to matchstrings.

Test Expression Regular Expression Contains data that you enter totest the regular expression. Asyou type data in this field, thefield highlights strings that matchthe regular expression.

Next Expression Regular Expression Moves to the next string thatmatches the regular expressionand changes the font of thatstring to bold.

Previous Expression Regular Expression Moves to the previous string thatmatches the regular expressionand changes the font of thatstring to bold.

Label Character Defines the label that a Labelertransformation applies to datathat matches the character set.

Standard Mode Character Enables a simple editing viewthat includes fields for the startrange and end range.

Start Range Character Specifies the first character in acharacter range.

End Range Character Specifies the last character in acharacter range. For single-character ranges, leave this fieldblank.

Advanced Mode Character Enables an advanced editingview where you can manuallyenter character ranges using

16 Chapter 2: Reference Data

Page 28: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Property Token Set Mode Description

range characters and delimitercharacters.

Range Character Character Temporarily changes the symbolthat signifies a character range.The range character reverts tothe default character when youclose the character set.

Delimiter Character Character Temporarily changes the symbolthat separates character ranges.The delimiter character reverts tothe default character when youclose the character set.

Creating a Content SetCreate content sets to group content expressions according to business requirements. You create content sets inthe Developer tool.

1. In the Object Explorer view, select the project or folder where you want to store the content set.

2. Click File > New > Content Set.

3. Enter a name for the content set.

4. Optionally, select Browse to change the Model repository location for the content set.

5. Click Finish.

Creating a Reusable Content ExpressionCreate reusable content expressions from within a content set. You can use these content expressions in Labelertransformations and Parser transformations.

1. Open a content set in the editor and select the Content view.

2. Select a content expression view.

3. Click Add.

4. Enter a name for the content expression.

5. Optionally, enter a text description of the content expression.

6. If you selected the Token Set expression view, select a token set mode.

7. Click Next.

8. Configure the content expression properties.

9. Click Finish.

Tip: You can create content expressions by copying them from another content set. Use the Copy To and PasteFrom options to create copies of existing content expressions. You can use the CTRL key to select multiplecontent expressions when using these options.

Content Sets 17

Page 29: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Part II: Data Quality Features inInformatica Developer

This part contains the following chapters:

¨ Column Profiles in Informatica Developer, 19

¨ Column Profile Results in Informatica Developer, 23

¨ Rules in Informatica Developer, 26

¨ Scorecards in Informatica Developer, 28

¨ Mapplet and Mapping Profiling, 30

¨ Reference Data, 32

18

Page 30: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 3

Column Profiles in InformaticaDeveloper

This chapter includes the following topics:

¨ Column Profile Concepts Overview, 19

¨ Column Profile Options, 20

¨ Rules, 20

¨ Scorecards, 20

¨ Column Profiles in Informatica Developer, 21

¨ Creating a Single Data Object Profile, 22

Column Profile Concepts OverviewA column profile determines the characteristics of columns in a data source, such as value frequency,percentages, and patterns.

Column profiling discovers the following facts about data:

¨ The number of unique and null values in each column, expressed as a number and a percentage.

¨ The patterns of data in each column and the frequencies with which these values occur.

¨ Statistics about the column values, such as the maximum and minimum lengths of values and the first and lastvalues in each column.

Use column profile options to select the columns on which you want to run a profile, set data sampling options,and set drilldown options when you create a profile.

A rule is business logic that defines conditions applied to source data when you run a profile. You can add a ruleto the profile to cleanse, change, or validate data.

Create scorecards to periodically review data quality. You create scorecards before and after you apply rules toprofiles so that you can view a graphical representation of the valid values for columns.

19

Page 31: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Column Profile OptionsWhen you create a profile with the Column Profiling option, you can use the profile wizard to define filter andsampling options. These options determine how the profile reads rows from the data set.

After you complete the steps in the profile wizard, you can add a rule to the profile. The rule can have the businesslogic to perform data transformation operations on the data before column profiling.

RulesCreate and apply rules within profiles. A rule is business logic that defines conditions applied to data when you runa profile. Use rules to further validate the data in a profile and to measure data quality progress.

You can add a rule after you create a profile. You can reuse rules created in either the Analyst tool or Developertool in both the tools. Add rules to a profile by selecting a reusable rule or create an expression rule. Anexpression rule uses both expression functions and columns to define rule logic. After you create an expressionrule, you can make the rule reusable.

Create expression rules in the Analyst tool. In the Developer tool, you can create a mapplet and validate themapplet as a rule. You can run rules from both the Analyst tool and Developer tool.

ScorecardsA scorecard is the graphical representation of the valid values for a column or output of a rule in profile results.Use scorecards to measure data quality progress. You can create a scorecard from a profile and monitor theprogress of data quality over time.

A scorecard has multiple components, such as metrics, metric groups, and thresholds. After you run a profile, youcan add source columns as metrics to a scorecard and configure the valid values for the metrics. Use a metricgroup to categorize related metrics in a scorecard into a set. A threshold identifies the range, in percentage, of baddata that is acceptable for columns in a record. You can set thresholds for good, acceptable, or unacceptableranges of data.

When you run a scorecard, you can configure whether you want to drill down on the metrics for a score on the livedata or staged data. After you run a scorecard and view the scores, you can drill down on each metric to identifyvalid data records and records that are not valid. To track data quality effectively, you can use trendcharts andmonitor how the scores change over a period of time.

The profiling warehouse stores the scorecard statistics and configuration information. You can configure a third-party application to get the scorecard results and run reports. You can also display the scorecard results in a webapplication, portal, or report such as a business intelligence report.

20 Chapter 3: Column Profiles in Informatica Developer

Page 32: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Column Profiles in Informatica DeveloperUse a column profile to analyze the characteristics of columns in a data set, such as value percentages and valuepatterns. You can add filters to determine the rows that the profile reads at runtime. The profile does not processrows that do not meet the filter criteria.

You can discover the following types of information about the columns you profile:

¨ The number of times a value appears in a column.

¨ The frequency of occurrence of each value in a column, expressed as a percentage.

¨ The character patterns of the values in a column.

¨ The maximum and minimum lengths of the values in a column, and the first and last values.

You can define a column profile for a data object in a mapping or mapplet or an object in the Model repository. Theobject in the repository can be in a single data object profile, multiple data object profile, or profile model.

You can add rules to a column profile. Use rules to select a subset of source data for profiling. You can alsochange the drilldown options for column profiles to determine whether the drilldown reads from staged data or livedata.

Filtering OptionsYou can add filters to determine the rows that a column profile uses when performing profiling operations. Theprofile does not process rows that do not meet the filter criteria.

1. Create or open a column profile.

2. Select the Filter view.

3. Click Add.

4. Select a filter type and click Next.

5. Enter a name for the filter. Optionally, enter a text description of the filter.

6. Select Set as Active to apply the filter to the profile. Click Next.

7. Define the filter criteria.

8. Click Finish.

Sampling PropertiesConfigure the sampling properties to determine the number of rows that the profile reads during a profilingoperation.

The following table describes the sampling properties:

Property Description

All Rows Reads all rows from the source. Default is enabled.

First Reads from the first row up to the row you specify.

Random Sample of Reads a random sample from the number of rows that you specify.

Random Sample (Auto) Reads from a random sample of rows.

Column Profiles in Informatica Developer 21

Page 33: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Creating a Single Data Object ProfileYou can create a single data object profile for one or more columns in a data object and store the profile object inthe Model repository.

1. In the Object Explorer view, select the data object you want to profile.

2. Click File > New > Profile to open the profile wizard.

3. Select Profile and click Next.

4. Enter a name for the profile and verify the project location. If required, browse to a new location.

5. Optionally, enter a text description of the profile.

6. Verify that the name of the data object you selected appears within the Data Objects section.

7. Click Next.

8. Configure the profile operations that you want to perform. You can configure the following operations:

¨ Column profiling

¨ Primary key discovery

¨ Functional dependency discovery

¨ Data domain discovery

Note: To enable a profile operation, select Enabled as part of the "Run Profile" action for that operation.Column profiling is enabled by default.

9. Review the options for your profile.

You can edit the column selection for all profile types. Review the filter and sampling options for columnprofiles. You can review the inference options for primary key, functional dependency, and data domaindiscovery. You can also review data domain selection for data domain discovery.

10. Review the drilldown options, and edit them if necessary. By default, the Enable Row Drilldown option isselected. You can edit drilldown options for column profiles. The options also determine whether drilldownoperations read from the data source or from staged data, and whether the profile stores result data fromprevious profile runs.

11. Click Finish.

The profile is ready to run.

22 Chapter 3: Column Profiles in Informatica Developer

Page 34: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 4

Column Profile Results inInformatica Developer

This chapter includes the following topics:

¨ Column Profile Results in Informatica Developer, 23

¨ Column Value Properties, 24

¨ Column Pattern Properties, 24

¨ Column Statistics Properties, 24

¨ Exporting Profile Results from Informatica Developer, 25

Column Profile Results in Informatica DeveloperColumn profile analysis provides information about data quality by highlighting patterns and instances of non-conformance in data.

The following table describes the profile results for each type of analysis:

Profile Type Profile Results

Column profile - Percentage and count statistics for unique and null values- Inferred datatypes- The datatype that the data source declares for the data- The maximum and minimum values- The date and time of the most recent profile run- Percentage and count statistics for each unique data element in a column- Percentage and count statistics for each unique character pattern in a column

Primary key profile - Inferred primary keys- Key violations

Functional dependency profile - Inferred functional dependencies- Functional dependency violations

23

Page 35: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Column Value PropertiesColumn value properties show the values in the profiled columns and the frequency with which each valueappears in each column. The frequencies are shown as a number, a percentage, and a bar chart.

To view column value properties, select Values from the Show menu. Double-click a column value to drill-down tothe rows that contain the value.

The following table describes the properties for column values:

Property Description

Values List of all values for the column in the profile.

Frequency Number of times a value appears in a column.

Percent Number of times a value appears in a column, expressed as a percentage of allvalues in the column.

Chart Bar chart for the percentage.

Column Pattern PropertiesColumn pattern properties show the patterns of data in the profiled columns and the frequency with which thepatterns appear in each column. The patterns are shown as a number, a percentage, and a bar chart.

To view pattern information, select Patterns from the Show menu. Double-click a pattern to drill-down to the rowsthat contain the pattern.

The following table describes the properties for column value patterns:

Property Description

Patterns Pattern for the selected column.

Frequency Number of times a pattern appears in a column.

Percent Number of times a pattern appears in a column, expressed as a percentage of allvalues in the column.

Chart Bar chart for the percentage.

Column Statistics PropertiesColumn statistics properties provide maximum and minimum lengths of values and first and last values.

To view statistical information, select Statistics from the Show menu.

24 Chapter 4: Column Profile Results in Informatica Developer

Page 36: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

The following table describes the column statistics properties:

Property Description

Maximum Length Length of the longest value in the column.

Minimum Length Length of the shortest value in the column.

Bottom Last five values in the column.

Top First five values in the column.

Note: The profile also displays average and standard deviation statistics for columns of type Integer.

Exporting Profile Results from Informatica DeveloperYou can export column values and column pattern data from profile results.

Export column values in Distinct Value Count format. Export pattern values in Domain Inference format.

1. In the Object Explorer view, select and open a profile.

2. Optionally, run the profile to update the profile results.

3. Select the Results view.

4. Select the column that contains the data for export.

5. Under Details, select Values or select Patterns and click the Export button.

The Export data to a file dialog box opens.

6. Accept or change the file name. The default name is [Profile_name]_[column_name]_DVC for column valuedata and [Profile_name]_[column_name]_DI for pattern data.

7. Select the type of data to export. You can select either Values for the selected column or Patterns for theselected column.

8. Under Save, choose Save on Client and click Browse to select a location and save the file locally in yourcomputer. By default, Informatica Developer writes the file to a location set in the Data Integration Serviceproperties of Informatica Administrator.

9. If you do not want to export field names as the first row, clear the Export field names as first row check box.

10. Click OK.

Exporting Profile Results from Informatica Developer 25

Page 37: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 5

Rules in Informatica DeveloperThis chapter includes the following topics:

¨ Rules in Informatica Developer Overview, 26

¨ Creating a Rule in Informatica Developer, 26

¨ Applying a Rule in Informatica Developer, 27

Rules in Informatica Developer OverviewA rule is business logic that defines conditions applied to source data when you run a profile. You can createreusable rules from mapplets in the Developer tool. You can reuse these rules in Analyst tool profiles to change orvalidate source data.

Create a mapplet and validate as a rule. This rule appears as a reusable rule in the Analyst tool. You can applythe rule to a column profile in the Developer tool or in the Analyst tool.

A rule must meet the following requirements:

¨ It must contain an Input and Output transformation. You cannot use data sources in a rule.

¨ It can contain Expression transformations, Lookup transformations, and passive data quality transformations. Itcannot contain any other type of transformation. For example, a rule cannot contain a Match transformation asit is an active transformation.

¨ It does not specify cardinality between input groups.

Creating a Rule in Informatica DeveloperYou need to validate a mapplet as a rule to create a rule in the Developer tool.

Create a mapplet in the Developer tool.

1. Right-click the mapplet editor.

2. Select Validate As > Rule.

26

Page 38: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Applying a Rule in Informatica DeveloperYou can add a rule to a saved column profile. You cannot add a rule to a profile configured for join analysis.

1. Browse the Object Explorer view and find the profile you need.

2. Right-click the profile and select Open.

The profile opens in the editor.

3. Click the Definition tab, and select Rules.

4. Click Add.

The Apply Rule dialog box opens.

5. Click Browse to find the rule you want to apply.

Select a rule from a repository project, and click OK.

6. Click the Value column under Input Values to select an input port for the rule.

7. Optionally, click the Value column under Output Values to edit the name of the rule output port.

The rule appears in the Definition tab.

Applying a Rule in Informatica Developer 27

Page 39: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 6

Scorecards in InformaticaDeveloper

This chapter includes the following topics:

¨ Scorecards in Informatica Developer Overview, 28

¨ Creating a Scorecard, 28

Scorecards in Informatica Developer OverviewA scorecard is a graphical representation of the quality measurements in a profile. You can view scorecards in theDeveloper tool. After you create a scorecard in the Developer tool, you can connect to the Analyst tool to open thescorecard. You can run and edit the scorecard in the Analyst tool. You can run the scorecard on current data inthe data object or on data stored in the staging database.

Creating a ScorecardCreate a scorecard and add columns from a profile to the scorecard. You must run a profile before you addcolumns to the scorecard.

1. In the Object Explorer view, select the project or folder where you want to create the scorecard.

2. Click File > New > Scorecard.

The New Scorecard dialog box appears.

3. Click Add.

The Select Profile dialog box appears. Select the profile that contains the columns you want to add.

4. Click OK, then click Next.

5. Select the columns that you want to add to the scorecard.

By default, the scorecard wizard selects the columns and rules defined in the profile. You cannot add columnsthat are not included in the profile.

6. Click Finish.

The Developer tool creates the scorecard.

28

Page 40: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

7. Optionally, click Open with Informatica Analyst to connect to the Analyst tool and open the scorecard in theAnalyst tool.

Creating a Scorecard 29

Page 41: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 7

Mapplet and Mapping ProfilingThis chapter includes the following topics:

¨ Mapplet and Mapping Profiling Overview, 30

¨ Running a Profile on a Mapplet or Mapping Object, 30

¨ Comparing Profiles for Mapping or Mapplet Objects, 31

¨ Generating a Mapping from a Profile, 31

Mapplet and Mapping Profiling OverviewYou can define a column profile for an object in a mapplet or mapping. Run a profile on a mapplet or a mappingobject when you want to verify the design of the mapping or mapplet without saving the profile results. You canalso generate a mapping from a profile.

Running a Profile on a Mapplet or Mapping ObjectWhen you run a profile on a mapplet or mapping object, the profile runs on all data columns and enables drill-down operations on the data that is staged for the data object. You can run a profile on a mapplet or mappingobject with multiple output ports.

The profile traces the source data through the mapping to the output ports of the object you selected. The profileanalyzes the data that would appear on those ports if you ran the mapping.

1. Open a mapplet or mapping.

2. Verify that the mapplet or mapping is valid.

3. Right-click a data object or transformation and select Profile Now.

If the transformation has multiple output groups, the Select Output Group dialog box appears. If thetransformation has a single output group, the profile results appear on the Results tab of the profile.

4. If the transformation has multiple output groups, select the output groups as necessary.

5. Click OK.

The profile results appears in the Results tab of the profile.

30

Page 42: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Comparing Profiles for Mapping or Mapplet ObjectsYou can create a profile that analyzes two objects in a mapplet or mapping and compares the results of thecolumn profiles for those objects.

Like profiles of single mapping or mapplet objects, profile comparisons run on all data columns and enable drill-down operations on the data that is staged for the data objects.

1. Open a mapplet or mapping.

2. Verify that the mapplet or mapping is valid.

3. Press the CTRL key and click two objects in the editor.

4. Right-click one of the objects and select Compare Profiles.

5. Optionally, configure the profile comparison to match columns from one object to the other object.

6. Optionally, match columns by clicking a column in one object and dragging it onto a column in the otherobject.

7. Optionally, choose whether the profile analyzes all columns or matched columns only.

8. Click OK.

Generating a Mapping from a ProfileYou can create a mapping object from a profile. Use the mapping object you create to develop a valid mapping.The mapping you create has a data source based on the profiled object and can contain transformations based onprofile rule logic. After you create the mapping, add objects to complete it.

1. In the Object Explorer view, find the profile on which to create the mapping.

2. Right-click the profile name and select Generate Mapping.

The Generate Mapping dialog box displays.

3. Enter a mapping name. Optionally, enter a description for the mapping.

4. Confirm the folder location for the mapping.

By default, the Developer tool creates the mapping in the Mappings folder in the same project as the profile.Click Browse to select a different location for the mapping.

5. Confirm the profile definition that the Developer tool uses to create the mapping. To use another profile, clickSelect Profile.

6. Click Finish.

The mapping appears in the Object Explorer.

Add objects to the mapping to complete it.

Comparing Profiles for Mapping or Mapplet Objects 31

Page 43: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 8

Reference DataThis chapter includes the following topics:

¨ Reference Tables Overview, 32

¨ Reference Table Data Properties, 32

¨ Creating a Reference Table Object, 33

¨ Creating a Reference Table from a Flat File, 34

¨ Creating a Reference Table from a Relational Source , 35

¨ Copying a Reference Table in the Model Repository, 36

Reference Tables OverviewInformatica provides reference tables that you can import to the Model repository. You can also create referencetables and connect to database tables that contain reference data.

Use the Developer tool to create and update reference tables and to add reference data objects to transformations.

Reference Table Data PropertiesYou can view properties for reference table data and metadata in the Developer tool. The Developer tool displaysthe properties when you open the reference table from the Model repository.

A reference table displays general properties and column properties. You can view reference table properties inthe Developer tool. You can view and edit reference table properties in the Analyst tool.

The following table describes the general properties of a reference table:

Property Description

Name Name of the reference table.

Description Optional description of the reference table.

32

Page 44: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

The following table describes the column properties of a reference table:

Property Description

Valid Identifies the column that contains the valid reference data.

Name Name of each column.

Data Type Data type of the data in each column.

Precision Precision of each column.

Scale Scale of each column.

Description Description of the contents of the column. You can optionallyadd a description when you create the reference table.

Include a column for low-level descriptions Indicates that the reference table contains a column fordescriptions of column data.

Default value Default value for the fields in the column. You can optionallyadd a default value when you create the reference table.

Connection Name Name of the connection to the database that contains thereference table data values.

Creating a Reference Table ObjectChoose this option when you want to create an empty reference table and add values by hand.

1. Select File > New > Reference Table from the Developer tool menu.

2. In the new table wizard, select Reference Table as Empty.

3. Enter a name for the table.

4. Select a project to store the table metadata.

At the Location field, click Browse. The Select Location dialog box opens and displays the projects in therepository. Select the project you need.

Click Next.

5. Add two or more columns to the table. Click the Newoption to create a column.

Set the following properties for each column:

Property Default Value

Name column

Data Type string

Precision 10

Scale 0

Creating a Reference Table Object 33

Page 45: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Property Default Value

Description Empty. Optional property.

6. Select the column that contains the valid values. You can change the order of the columns that you create.

7. Optionally, edit the following properties:

Property Default Value

Include a column for row-level descriptions Cleared

Audit note Empty

Default value Empty

Maximum rows to preview 500

Click Finish.

The reference table opens in the Developer tool workspace.

Creating a Reference Table from a Flat FileYou can create a reference table from data stored in a flat file.

1. Select File > New > Reference Table from the Developer tool menu.

2. In the new table wizard, select Reference Table from a Flat File.

3. Browse to the file you want to use as the data source for the table.

4. Enter a name for the table.

5. Select a project to store the table metadata.

At the Location field, click Browse. The Select Location dialog box opens and displays the projects in therepository. Select the project you need.

Click Next.

6. Set UTF-8 as the code page.

7. Specify the delimiter that the flat file uses.

8. If the flat file contains column names, select the option to import column names from the first line of the file.

9. Optionally, edit the following properties:

Property Default Value

Text qualifier No quotation marks

Start import at line Line 1

Row Delimiter \012 LF (\n)

34 Chapter 8: Reference Data

Page 46: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Property Default Value

Treat consecutive delimiters as one Cleared

Escape character Empty

Retain escape character in data Cleared

Maximum rows to preview 500

Click Next.

10. Select the column that contains the valid values. You can change the order of the columns.

11. Optionally, edit the following properties:

Property Default Value

Include a column for row-level descriptions Cleared

Audit note Empty

Default value Empty

Maximum rows to preview 500

Click Finish.

The reference table opens in the Developer tool workspace.

Creating a Reference Table from a Relational SourceYou can use a database source to create a managed or unmanaged reference table. To create a managedreference table, connect to the staging database that the Model repository uses. To create an unmanagedreference table, connect to another database.

Note: You can configure a database connection in the Connection Explorer. If the Developer tool does not showthe Connection Explorer, select Window > Show View > Connection Explorer from the Developer tool menu.

1. Select File > New > Reference Table from the Developer tool menu.

2. In the new table wizard, select Reference Table from a Relational Source.. Click Next.

3. Select a database connection. The Developer tool uses this connection to identify a set of resources for thenew reference table.

At the Connection field, click Browse. The Choose Connection dialog box opens and displays the availabledatabase connections. Click More in the Choose Connection dialog box to browse other connections in theInformatica domain.

4. If the database connection you select does not specify the staging database, select Unmanaged table.

5. Select a database resource.

At the Resource field, click Browse. The Choose Connection dialog box opens and displays the resourceson the database connection. Explore the database and select the resource you need.

6. Enter a name for the table.

Creating a Reference Table from a Relational Source 35

Page 47: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

7. Select a project to store the reference table object.

At the Location field, click Browse. The Select Location dialog box opens and displays the projects in therepository. Select the project.

Click Next.

8. Select the column that contains the valid values. You can change the order of the columns.

9. Optionally, edit the following properties:

Property Default Value

Include a column for row-level descriptions Cleared

Audit note Empty

Default value Empty

Maximum rows to preview 500

Click Finish.

Copying a Reference Table in the Model RepositoryYou can copy a reference table between projects and folders in the Model repository.

The reference table and the copy you create are not linked in the Model repository or in the database. When youcreate a copy, you create a new database table.

1. Browse the Model repository, and find the reference table you want to copy.

2. Right-click the reference table, and select Copy from the context menu.

3. In the Model repository, find the project or folder you want to store to copy of the table.

4. Click Paste.

36 Chapter 8: Reference Data

Page 48: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Part III: Data Quality Features inInformatica Analyst

This part contains the following chapters:

¨ Column Profiles in Informatica Analyst, 38

¨ Column Profile Results in Informatica Analyst, 45

¨ Rules in Informatica Analyst, 52

¨ Scorecards in Informatica Analyst, 56

¨ Exception Record Management, 66

¨ Reference Tables, 71

37

Page 49: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 9

Column Profiles in InformaticaAnalyst

This chapter includes the following topics:

¨ Column Profiles in Informatica Analyst Overview, 38

¨ Column Profiling Process, 39

¨ Profile Options, 39

¨ Creating a Column Profile in the Analyst Tool, 41

¨ Editing a Column Profile, 42

¨ Running a Profile, 42

¨ Creating a Filter, 42

¨ Managing Filters, 43

¨ Synchronizing a Flat File Data Object, 43

¨ Synchronizing a Relational Data Object, 44

Column Profiles in Informatica Analyst OverviewWhen you create a profile, you select the columns in the data object for which you want to profile data. You canset or configure sampling and drilldown options for faster profiling. After you run the profile, you can examine theprofiling statistics to understand the data.

You can profile wide tables and flat files that have a large number of columns. You can profile tables with morethan 30 columns and flat files with more than 100 columns. When you create or run a profile, you can choose toselect all the columns or select each column you want to include for profiling. The Analyst tool displays the first 30columns in the data preview. You can select all columns for drilldown and view value frequencies for thesecolumns. You can use rules that have more than 50 output fields and include the rule columns for profiling whenyou run the profile again.

38

Page 50: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Column Profiling ProcessAs part of column profiling process, you can choose to create a quick profile or a custom profile for a data object.Use a quick profile to include all columns for a data object and use the default profile options. Use a custom profileto select the columns for a data object and to configure the profile results, sampling, and drilldown options.

The following steps describe the column profiling process:

1. Select the data object you want to profile.

2. Determine whether you want to create a quick profile or a custom profile.

3. Choose where you want to save the profile.

4. Select the columns you want to profile.

5. Select the profile results option.

6. Choose the sampling options.

7. Choose the drilldown options.

8. Define a filter to determine the rows that the profile reads at run time.

9. Run the profile.

Note: Consider the following rules and guidelines for column names and profiling multilingual and Unicode data:

¨ You cannot add a column to a profile if both the column name and profile name match. You cannot add thesame column twice to a profile even if you change the column name.

¨ You can profile multilingual data from different sources and view profile results based on the locale settings inthe browser. The Analyst tool changes the Datetime, Numeric, and Decimal datatypes based on the browserlocale.

¨ Sorting on multilingual data. You can sort on multilingual data. The Analyst tool displays the sort order basedon the browser locale.

¨ To profile Unicode data in a DB2 database, set the DB2CODEPAGE database environment variable in thedatabase and restart the Data Integration Service.

Profile OptionsProfile options include profile results option, data sampling options, and data drilldown options. You can configurethese options when you create a column profile for a data object.

You use the New Profile wizard to configure the profile options. You can choose to create a profile with the defaultoptions for columns, sampling, and drilldown options. When you create a profile for multiple data sources, theAnalyst tool uses default column profiling options.

Column Profiling Process 39

Page 51: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Profile Results OptionYou can choose to discard previous profile results or to display results for previous profile runs.

The following table describes the profile results option for a profile:

Option Description

Show results only for columns, rules selected in current run Discards the profile results for previously profiled columnsand displays results for the columns and rules selected for thelatest profile run. Do not select this option if you want theAnalyst tool to display profile results for previously profiledcolumns.

Sampling OptionsSampling options determine the number of rows that the Analyst tool chooses to profile. You can configuresampling options when you go through the wizard or when you run a profile.

The following table describes the sampling options for a profile:

Option Description

All Rows Chooses all rows in the data object.

First <number> Rows The number of rows that you want to run the profile against.The Analyst tool chooses the rows from the first rows in thesource.

Random Sample <number> Rows The number of rows for a random sample to run the profileagainst. Random sampling forces the Analyst tool to performdrilldown on staged data. Note that this can impact drilldownperformance.

Random sample Random sample size based on the number of rows in the dataobject. Random sampling forces the Analyst tool to performdrilldown on staged data. Note that this can impact drilldownperformance.

Drilldown OptionsYou can configure drilldown options when you go through the wizard or when you run a profile.

The following table describes the drilldown options for a profile:

Options Description

Enable Row Drilldown Drills down to row data in the profile results. By default, thisoption is selected.

Select Columns Identifies columns for drilldown that you did not select forprofiling.

Drilldown on live or staged data Drills down on live data to read current data in the datasource.

40 Chapter 9: Column Profiles in Informatica Analyst

Page 52: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Options Description

Drill down on staged data to read profile data that is staged inthe profiling warehouse.

Creating a Column Profile in the Analyst ToolSelect a data object and create a custom profile or a default profile. When you create a custom profile, you canconfigure the columns, the rows to sample, and the drilldown options. The Analyst tool creates the profile in thesame project and folder as the data object.

1. In the Navigator, select the project that contains the data object that you want to create a custom profile for.

2. In the Contents panel, right-click the data object and select New > Profile.

The New Profile wizard appears. The Column profiling option is selected by default.

3. Click Next.

4. In the Sources panel, select a data object.

5. Choose to create a default profile or a custom profile.

¨ To create a default profile, click Save or Save & Run.

¨ To create a custom profile, click Next.

6. Enter a name and an optional description for the profile.

7. In the Folders panel, select the project or folder where you want to create the profile.

The Analyst tool displays the project that you selected and shared projects that contain folders where you cancreate the profile. The profile objects in the folder appear in the Profiles panel.

8. Click Next.

9. In the Columns panel, select the columns that you want to profile. The columns include any rules you appliedto the profile. The Analyst tool lists the name, datatype, precision, and scale for each column.

Optionally, select Name to select all columns.

10. Accept the default option in the Profile Results Option panel.

The first time you run the profile, the Analyst tool displays profile results for all columns selected for profiling.

11. In the Sampling Options panel, configure the sampling options.

12. In the Drilldown Options panel, configure the drilldown options.

Optionally, click Select Columns to select columns to drill down on. In the Drilldown columns window,select the columns for drill down and click OK.

13. Click Next.

14. Optionally, define a filter for the profile.

15. Click Next to verify the row drilldown settings including the preview columns for drilldown.

16. Click Save to create the profile, or click Save & Run to create the profile and then run the profile.

Creating a Column Profile in the Analyst Tool 41

Page 53: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Editing a Column ProfileYou can make changes to a column profile after running it.

1. In the Navigator, select the project or folder that contains the profile that you want to edit.

2. Click the profile to open it.

The profile opens in a tab.

3. Click Actions > Edit.

A short-cut menu appears.

4. Based on the changes you want to make, choose one of the following menu options:

¨ General. Change the basic properties such as name, description, and profile type.

¨ Data Source. Choose another matching data source.

¨ Column Profiling. Select the columns you want to run the profile on and configure the necessarysampling and drill down options.

¨ Column Profiling Filter. Create, edit, and delete filters.

¨ Column Profiling Rules. Create rules or change current ones.

¨ Data Domain Discovery. Set up data domain discovery options.

5. Click Save to save the changes or click Save & Run to save the changes and then run the profile.

Running a ProfileRun a profile to analyze a data source for content and structure and select columns and rules for drill down. Youcan drill down on live or staged data for columns and rules. You can run a profile on a column or rule withoutprofiling all the source columns again after you run the profile.

1. In the Navigator, select the project or folder that contains the profile you want to run.

2. Click the profile to open it.

The profile appears in a tab. Verify the profile options before you run the profile.

3. Click Actions > Run Profile.

The Analyst tool displays the profile results.

Creating a FilterYou can create a filter so that you can make a subset of the original data source that meets the filter criteria. Youcan then run a profile on this sample data.

1. Open a profile.

2. Click Actions > Edit > Column Profiling Filters to open the Edit Profile dialog box.

The current filters appear in the Filters panel.

3. Click New.

4. Enter a filter name and an optional description.

42 Chapter 9: Column Profiles in Informatica Analyst

Page 54: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

5. Select a simple, advanced, or SQL filter type.

¨ Simple. Use conditional operators, such as <, >, =, BETWEEN, and ISNULL for each column that youwant to filter.

¨ Advanced. Use function categories, such as Character, Consolidation, Conversion, Financial, Numerical,and Data cleansing.Click the function name on the Functions panel to view its return type, description, and parameters. Toinclude the function in the filter, click the right arrow (>) button, and you can specify the parameters in theFunction dialog box.

Note: For a simple or an advanced filter on a date column, provide the condition in the YYYY/MM/DDHH:MM:SS format.

¨ SQL. Creates SQL queries. You can create an SQL filter for relational data sources. Enter the WHEREclause expression to generate the SQL filter. For example, to filter company records in the Europeanregion from a Company table with a Region column, enter

Region = 'Europe'in the editor.

6. Click Validate to verify the SQL expression.

Managing FiltersYou can create, edit, and delete filters.

1. In the Navigator, select the project or folder that contains the profile you want to filter.

2. Open the profile.

3. Click Actions > Edit > Column Profiling Filters to open the Edit Profile dialog box.

The current filters appear in the Filters panel.

4. Choose to create, edit, or delete a filter.

¨ Click New to create a filter.

¨ Select a filter, and click Edit to change the filter settings.

¨ Select a filter, and click Delete to remove the filter.

Synchronizing a Flat File Data ObjectYou can synchronize the changes to an external flat file data source with its data object in Informatica Analyst.Use the Synchronize Flat File wizard to synchronize the data objects.

1. In the Contents panel, select a flat file data object.

2. Click Actions > Synchronize.

The Synchronize Flat File dialog box appears in a new tab.

3. Verify the flat file path in the Browse and Upload field.

4. Click Next.

A synchronization status message appears.

Managing Filters 43

Page 55: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

5. When you see a Synchronization complete message, click OK.

The message displays a summary of the metadata changes made to the data object. To view the details ofthe metadata changes, use the Properties view.

Synchronizing a Relational Data ObjectYou can synchronize the changes to an external relational data source with its data object in Informatica Analyst.External data source changes include adding, changing, and removing columns and changes to rules.

1. In the Contents panel, select a relational data object.

2. Click Actions > Synchronize.

A message prompts you to confirm the action.

3. To complete the synchronization process, click OK. Click Cancel to cancel the process.

If you click OK, a synchronization status message appears.

4. When you see a Synchronization complete message, click OK.

The message displays a summary of the metadata changes made to the data object. To view the details ofthe metadata changes, use the Properties view.

44 Chapter 9: Column Profiles in Informatica Analyst

Page 56: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 1 0

Column Profile Results inInformatica Analyst

This chapter includes the following topics:

¨ Column Profile Results in Informatica Analyst Overview, 45

¨ Profile Summary, 46

¨ Column Values, 47

¨ Column Patterns, 47

¨ Column Statistics, 48

¨ Column Profile Drilldown, 49

¨ Column Profile Export Files in Informatica Analyst, 49

Column Profile Results in Informatica Analyst OverviewView profile results to understand the structure of data and analyze its quality. You can view the profile resultsafter you run a profile. You can view a summary of the columns and rules in the profile and the values, patterns,and statistics for columns and rules.

After you run a profile, you can view the profile results in the Column Profiling, Properties, and Data Previewviews. You can export value frequencies, pattern frequencies, or drilldown data to a CSV file. You can export thecomplete profile summary information to a Microsoft Excel file so that you can view all data in a file for furtheranalysis.

In the Column Profiling view, you can view the summary information for columns for a profile run. You can viewvalues, patterns, and statistics for each column in the Values, Patterns, and Statistics views.

The Analyst tool displays rules as columns in profile results. The profile results for a rule appear as a profiledcolumn. The profile results that appear depend on the profile configuration and sampling options.

The following profiling results appear in the Column Profiling view:

¨ The summary information for the profile run, including the number of unique and null values, inferred datatype,and last run date and time.

¨ Values for columns and the frequency in which the value appears for the column. The frequency appears as anumber, a percentage, and a chart.

¨ Value patterns for the profiled columns and the frequency in which the pattern appears. The frequency appearsas a number and a percentage.

45

Page 57: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

¨ Statistics about the column values, such as average, length, and top and bottom values.

Note: You can select a value or pattern and view profiled rows that match the value or pattern on the Details panel.

In the Properties view, you can view profile properties on the Properties panel. You can view properties forcolumns and rules on the Columns and Rules panel.

In the Data Preview view, you can preview the profile data. The Analyst tool includes all columns in the profile anddisplays the first 100 rows of data.

Profile SummaryThe summary for a profile run includes the number of unique and null values expressed as a number and apercentage, inferred datatypes, and last run date and time. You can click each profile summary property to sort onvalues of the property.

The following table describes the profile summary properties:

Property Description

Name Name of the column in the profile.

Unique Values Number of unique values for the column.

% Unique Percentage of unique values for the column.

Null Number of null values for the column.

% Null Percentage of null values for the column.

Datatype Datatype derived from the values for the column. The Analyst tool can derive thefollowing datatypes from the datatypes of values in columns:- String- Varchar- Decimal- Integer- "-" for NullsNote: The Analyst tool cannot derive the datatype from the values of a numericcolumn that has a precision greater than 38. The Analyst tool cannot derive thedatatype from the values of a string column that has a precision greater than 255. Ifyou have a date column on which you are creating a column profile with a yearvalue earlier than 1800, the inferred datatype may show up as fixed length string.Change the default value for the year-minimum parameter in theInferDateTimeConfig.xml, as necessary.

% Inferred Percentage of values that match the data type inferred by the Analyst tool.

Documented Datatype Datatype declared for the column in the profiled object.

Maximum Value Maximum value in the column.

Minimum Value Minimum value in the column.

46 Chapter 10: Column Profile Results in Informatica Analyst

Page 58: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Property Description

Last Profile Run Date and time you last ran the profile.

Drilldown If selected, drills down on live data for the column.

Column ValuesThe column values include values for columns and the frequency in which the value appears for the column.

The following table describes the properties for the column values:

Property Description

Value List of all values for the column in the profile.Note: The Analyst tool excludes the CLOB, BLOB, Raw, and Binary datatypes in column values in aprofile.

Frequency Number of times a value appears for a column, expressed as a number, a percentage, and a chart.

Percent Percentage that a value appears for a column.

Chart Chart for the percentage.

Drill down Drills down to specific source rows based on a column value.

Note: To sort the Value and Frequency columns, select the columns. When you sort the results of the Frequencycolumn, the Analyst tool sorts the results based on the datatype of the column.

Column PatternsThe column patterns include the value patterns for the columns and the frequency in which the pattern appears.

The profiling warehouse stores 16,000 unique highest frequency values including NULL values for profile resultsby default. If there is at least one NULL value in the profile results, the Analyst tool can display NULL values aspatterns.

Note: The Analyst tool cannot derive the pattern for a numeric column that has a precision greater than 38. TheAnalyst tool cannot derive the pattern for a string column that has a precision greater than 255.

The following table describes the properties for the column patterns:

Property Description

Pattern Pattern for the column in the profile.

Frequency Number of times a pattern appears for a column, expressed as a number.

Column Values 47

Page 59: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Property Description

Percent Percentage that a pattern appears for a column.

Chart Chart for the percentage.

Drill down Drills down to specific source rows based on a column pattern.

The following table describes the pattern characters and what they represent:

Character Description

9 Represents any numeric character. Informatica Analyst displays up to three characters separately inthe "9" format. The tool displays more than three characters as a value within parentheses. Forexample, the format "9(8)" represents a numeric value with 8 digits.

X Represents any alphabetic character. Informatica Analyst displays up to three characters separatelyin the "X" format. The tool displays more than three characters as a value within parentheses. Forexample, the format "X(6)" may represent the value "Boston."Note: The pattern character X is not case sensitive and may represent upper case or lower casecharacters from the source data.

p Represents "(", the left parenthesis.

q Represents ")", the right parenthesis.

b Represents a blank space.

Column StatisticsThe column statistics include statistics about the column values, such as average, length, and top and bottomvalues. The statistics that appear depend on the column type.

The following table describes the types of column statistics for each column type:

Statistic Column Type Description

Average Integer Average of the values for the column.

Standard Deviation Integer The standard deviation, or variability between column values, forall values of the column.

Maximum Length Integer, String Length of the longest value for the column.

Minimum Length Integer, String Length of the shortest value for the column.

Bottom Integer, String Lowest values for the column.

Top Integer, String Highest values for the column.

48 Chapter 10: Column Profile Results in Informatica Analyst

Page 60: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Column Profile DrilldownDrilldown options for a column profile enable you to drill down to specific rows in the data source based on acolumn value. You can choose to read the current data in a data source for drilldown or read profile data staged inthe profile warehouse. When you drill down to a specific row on staged profile data, the Analyst tool creates adrilldown filter for the matching column value. After you drill down, you can edit, recall, reset, and save thedrilldown filter.

You can select columns for drilldown even if you did not choose those columns for profiling. You can choose toread the current data in a data source for drilldown or read profile data staged in the profiling warehouse. After youperform a drilldown on a column value, you can export drilldown data for the selected values or patterns to a CSVfile at a location you choose. Though Informatica Analyst displays the first 200 values for drilldown data, the toolexports all values to the CSV file.

Drilling Down on Row DataAfter you run a profile, you can drill down to specific rows that match the column value or pattern.

1. Run a profile.

The profile appears in a tab.

2. In the Summary view, select a column name to view the profile results for the column.

3. Select a column value on the Values tab or select a column pattern on the Patterns tab.

4. Click Actions > Drilldown to view the rows of data.

The Drilldown panel displays the rows that contain the values or patterns. The column value or patternappears at the top of the panel.

Note: You can choose to drill down on live data or staged data.

Applying Filters to Drilldown DataYou can filter the drilldown data iteratively so that you can analyze data irregularities on the subsets of profileresults.

1. Drill down to row data in the profile results.

2. Select a column value on the Values tab.

3. Right-click and select Drilldown Filter > Edit to open the DrillDown Filter dialog box.

4. Add filter conditions, and click Run.

5. To manage current drilldown filters, you can save, recall, or reset filters.

¨ To save a filter, select Drilldown Filter > Save.

¨ To go back to the last saved drilldown filter results, select Drilldown Filter > Recall.

¨ To reset the drilldown filter results, select Drilldown Filter > Reset.

Column Profile Export Files in Informatica AnalystYou can export column profile results to a CSV file or a Microsoft Excel file based on whether you choose a part ofthe profile results or the complete results summary.

Column Profile Drilldown 49

Page 61: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

You can export value frequencies, pattern frequencies, or drilldown data to a CSV file for selected values andpatterns. You can export the profiling results summary for all columns to a Microsoft Excel file. Use the DataIntegration Service privilege Drilldown and Export Results to determine, by user or group, who exports profileresults.

Profile Export Results in a CSV FileYou can export value frequencies, pattern frequencies, or drilldown data to view the data in a file. The Analyst toolsaves the information in a CSV file.

When you export inferred column patterns, the Analyst tool exports a different format of the column pattern. Forexample, when you export the inferred column pattern X(5), the Analyst tool displays the following format of thecolumn pattern in the CSV file: XXXXX.

Profile Export Results in Microsoft ExcelWhen you export the complete profile results summary, the Analyst tool saves the information to multipleworksheets in a Microsoft Excel file. The Analyst tool saves the file in the "xlsx" format.

The following table describes the information that appears on each worksheet in the export file:

Tab Description

Column Profile Summary information exported from the Column Profiling viewafter the profile runs. Examples are column names, rulenames, number of unique values, number of null values,inferred datatypes, and date and time of the last profile run.

Values Values for the columns and rules and the frequency in whichthe values appear for each column.

Patterns Value patterns for the columns and rules you ran the profileon and the frequency in which the patterns appear.

Statistics Statistics about each column and rule. Examples are average,length, top values, bottom values, and standard deviation.

Properties Properties view information, including profile name, type,sampling policy, and row count.

Exporting Profile Results from Informatica AnalystYou can export the results of a profile to a ".csv" or ".xlsx" file to view the data in a file.

1. In the Navigator, select the project or folder that contains the profile.

2. Click the profile to open it.

The profile opens in a tab.

3. In the Column Profiling view, select the column that you want to export.

4. Click Actions > Export Data.

The Export Data to a file window appears.

5. Enter the file name. Optionally, use the default file name.

6. Select the type of data to export.

50 Chapter 10: Column Profile Results in Informatica Analyst

Page 62: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

¨ All (Summary, Values, Patterns, Statistics, Properties)

¨ Value frequencies for the selected column.

¨ Pattern frequencies for the selected column.

¨ Drilldown data for the selected values or patterns.

7. Enter a file format. The format is Excel for the All option and CSV for the rest of the options.

8. Select the code page of the file.

9. Click OK.

Column Profile Export Files in Informatica Analyst 51

Page 63: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 1 1

Rules in Informatica AnalystThis chapter includes the following topics:

¨ Rules in Informatica Analyst Overview, 52

¨ Predefined Rules, 53

¨ Expression Rules, 54

Rules in Informatica Analyst OverviewA rule is business logic that defines conditions applied to source data when you run a profile. You can add a ruleto the profile to cleanse, change, or validate data.

You may want to use a rule in different circumstances. You can add a rule to cleanse one or more data columns.You can add a lookup rule that provides information that the source data does not provide. You can add a rule tovalidate a cleansing rule for a data quality or data integration project.

You can add a rule before or after you run a profile. When you add a rule to a profile, you can create a rule or youcan apply a rule. You can create or apply the following rule types for a profile:

¨ Expression rules. Use expression functions and columns to define rule logic. Create expression rules in theAnalyst tool. An analyst can create an expression rule and promote it to a reusable rule that other analysts canuse in multiple profiles.

¨ Predefined rules. Includes reusable rules that a developer creates in the Developer tool. Rules that adeveloper creates in the Developer tool as mapplets can appear in the Analyst tool as reusable rules.

After you add a rule to a profile, you can run the profile again for the rule column. The Analyst tool displays profileresults for the rule column. You can modify the rule and run the profile again to view changes to the profile results.The output of a rule can be one or more virtual columns. The virtual columns exist in the profile results. TheAnalyst tool profiles the virtual columns. For example, you use a predefined rule that splits a column that containsfirst and last names into FIRST_NAME and LAST_NAME virtual columns. The Analyst tool profiles theFIRST_NAME and LAST_NAME columns.

Note: If you delete a rule object that other object types reference, the Analyst tool displays a message that liststhose object types. Determine the impact of deleting the rule before you delete it.

52

Page 64: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Predefined RulesPredefined rules are rules created in the Developer tool or provided with the Developer tool and Analyst tool.Apply predefined rules to the Analyst tool profiles to modify or validate source data.

Predefined rules use transformations to define rule logic. You can use predefined rules with multiple profiles. Inthe Model repository, a predefined rule is a mapplet with an input group, an output group, and transformations thatdefine the rule logic.

Predefined Rules ProcessUse the New Rule Wizard to apply a predefined rule to a profile.

You can perform the following steps to apply a predefined rule:

1. Open a profile.

2. Select a predefined rule.

3. Review the rules parameters.

4. Select the input column.

5. Configure the profiling options.

Applying a Predefined RuleUse the New Rule Wizard to apply a predefined rule to a profile. When you apply a predefined rule, you select therule and configure the input and output columns for the rule. Apply a predefined rule to use a rule promoted as areusable rule or use a rule created by a developer.

1. In the Navigator, select the project or folder that contains the profile that you want to add the rule to.

2. Click the profile to open it.

The profile appears in a tab.

3. Click Actions > Add Rule.

The New Rule window appears.

4. Select the option to Apply a Rule.

5. Click Next.

6. In the Rules panel, select the rule that you want to apply.

The name, datatype, description, and precision columns appear for the Inputs and Outputs columns in theRules Parameters panel.

7. Click Next.

8. In the Inputs section, select an input column. The input column is a column name in the profile.

9. Optionally, in the Outputs section, configure the label of the output columns.

10. Click Next.

11. In the Columns panel, select the columns you want to profile. The columns include any rules you applied tothe profile. Optionally, select Name to include all columns.

The Analyst tool lists the name, datatype, precision, and scale for each column.

12. In the Sampling Options panel, configure the sampling options.

13. In the Drilldown Options panel, configure the drilldown options.

14. Click Save to apply the rule or click Save & Run to apply the rule and then run the profile.

Predefined Rules 53

Page 65: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Expression RulesExpression rules use expression functions and columns to define rule logic. Create expression rules and add themto a profile in the Analyst tool.

Use expression rules to change or validate values for columns in a profile. You can create one or more expressionrules to use in a profile. Expression functions are SQL-like functions used to transform source data. You cancreate expression rule logic with the following types of functions:

¨ Character

¨ Conversion

¨ Data Cleansing

¨ Date

¨ Encoding

¨ Financial

¨ Numeric

¨ Scientific

¨ Special

¨ Test

Expression Rules ProcessUse the New Rule Wizard to create an expression rule and add it to a profile.

The New Rule Wizard includes an expression editor. Use the expression editor to add expression functions,configure columns as input to the functions, validate the expression, and configure the return type, precision, andscale.

The output of an expression rule is a virtual column that uses the name of the rule as the column name. TheAnalyst tool profiles the virtual column. For example, you use an expression rule to validate a ZIP code. The rulereturns 1 if the ZIP Code is valid and 0 if the ZIP code is not valid. Informatica Analyst profiles the 1 and 0 outputvalues of the rule.

You can perform the following steps to create an expression rule:

1. Open a profile.

2. Configure the rule logic using expression functions and columns as parameters.

3. Configure the profiling options.

Creating an Expression RuleUse the New Rule Wizard to create an expression rule and add it to a profile. Create an expression rule to modifyor validate values for columns in a profile.

1. In the Navigator, select the project or folder that contains the profile that you want to add the rule to.

2. In the Contents panel, click the profile to open it.

The profile appears in a tab.

3. Click Actions > Edit > Column Profiling Rules.

The Edit Profile dialog box appears.

4. Click New.

54 Chapter 11: Rules in Informatica Analyst

Page 66: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

5. Select Create a rule.

6. Click Next.

7. Enter a name and optional description for the rule.

8. Optionally, choose to promote the rule as a reusable rule and configure the project and folder location.

If you promote a rule to a reusable rule, you or other users can use the rule in another profile as a predefinedrule.

9. In the Functions tab, select a function and click the right arrow to enter the parameters for the function.

10. In the Columns tab, select an input column and click the right arrow to add the expression in the Expressioneditor. You can also add logical operators to the expression.

11. Click Validate. You can proceed to the next step if the expression is valid.

12. Optionally, click Edit to configure the return type, precision, and scale.

13. Click Next.

14. In the Columns panel, select the columns you want to profile. The columns include any rules you applied tothe profile. Optionally, select Name to select all columns.

The Analyst tool lists the name, datatype, precision, and scale for each column.

15. In the Sampling Options panel, configure the sampling options.

16. In the Drilldown Options panel, configure the drilldown options.

17. Click Save to create the rule or click Save & Run to create the rule and then run the profile.

Expression Rules 55

Page 67: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 1 2

Scorecards in Informatica AnalystThis chapter includes the following topics:

¨ Scorecards in Informatica Analyst Overview, 56

¨ Informatica Analyst Scorecard Process, 56

¨ Metrics, 57

¨ Scorecard Notifications, 62

¨ Scorecard Integration with External Applications, 64

Scorecards in Informatica Analyst OverviewA scorecard is the graphical representation of valid values for a column in a profile. You can create scorecardsand drill down on live data or staged data.

Use scorecards to measure data quality progress. For example, you can create a scorecard to measure dataquality before you apply data quality rules. After you apply data quality rules, you can create another scorecard tocompare the effect of the rules on data quality.

Scorecards display the value frequency for columns as scores. The scores reflect the percentage of valid values inthe columns. After you run a profile, you can add columns from the profile as metrics to a scorecard. You cancreate metric groups so that you can group related metrics to a single entity. You can define thresholds thatspecify the range of bad data acceptable for columns in a record and assign metric weights for each metric. Whenyou run a scorecard, the Analyst tool generates weighted average values for each metric group. To identify validdata records and records that are not valid, you can drill down on each column. You can use trend charts in theAnalyst tool to track how scores change over a period of time.

Informatica Analyst Scorecard ProcessYou can run and edit the scorecard in the Analyst tool. You can create and view a scorecard in the Developer tool.You can run the scorecard on current data in the data object or on data stored in the staging database.

When you view a scorecard in the Contents view of the Analyst tool, it opens the scorecard in another tab. Afteryou run the scorecard, you can view the scores on the Scorecard view. You can select the data object andnavigate to the data object from a score within a scorecard. The Analyst tool opens the data object in another tab.

You can perform the following tasks when you work with scorecards:

1. Create a scorecard in the Developer tool and add columns from a profile.

56

Page 68: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

2. Optionally, connect to the Analyst tool and open the scorecard in the Analyst tool.

3. After you run a profile, add profile columns as metrics to the scorecard.

4. Run the scorecard to generate the scores for columns.

5. View the scorecard to see the scores for each column in a record.

6. Drill down on the columns for a score.

7. Edit a scorecard.

8. Set thresholds for each metric in a scorecard.

9. Create a group to add or move related metrics in the scorecard.

10. Edit or delete a group, as required.

11. View trend charts for each score to monitor how the score changes over time.

MetricsA metric is a column of a data source or output of a rule that is part of a scorecard. When you create a scorecard,you can assign a weight to each metric. Create a metric group to categorize related metrics in a scorecard into aset.

Metric WeightsWhen you create a scorecard, you can assign a weight to each metric. The default value for a weight is 1.

When you run a scorecard, the Analyst tool calculates the weighted average for each metric group based on themetric score and weight you assign to each metric.

For example, you assign a weight of W1 to metric M1, and you assign a weight of W2 to metric M2. The Analysttool uses the following formula to calculate the weighted average:

(M1 X W1 + M2 X W2) / (W1 + W2)

Adding Columns to a ScorecardAfter you run a profile, you can add profile columns to a scorecard. Use the Add to Scorecard Wizard to addcolumns from a profile to a scorecard and configure the valid values for the columns. If you add a profile column toa scorecard from a source profile that has a filter or a sampling option other than All Rows, profile results may notreflect the scorecard results.

1. In the Navigator, select the project or folder that contains the profile.

2. Click the profile to open it.

The profile appears in a tab.

3. Click Actions > Run Profile to run the profile.

4. Click Actions > Add to Scorecard.

The Add to Scorecard Wizard appears.

Note: Use the following rules and guidelines before you add columns to a scorecard:

¨ You cannot add a column to a scorecard if both the column name and scorecard name match.

Metrics 57

Page 69: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

¨ You cannot add a column twice to a scorecard even if you change the column name.

5. Select Existing Scorecard to add the columns to an existing scorecard.

The New Scorecard option is selected by default.

6. Click Next.

7. Select the scorecard that you want to add the columns to, and click Next.

8. Select the columns and rules that you want to add to the scorecard as metrics. Optionally, click the check boxin the left column header to select all columns. Optionally, select Column Name to sort column names.

9. Select each metric in the Metrics panel and configure the valid values from the list of all values in the Scoreusing: Values panel.

You can select multiple values in the Available Values panel and click the right arrow button to move them tothe Selected Values panel.

10. Select each metric in the Metrics panel and configure metric thresholds in the Metric Thresholds panel.

You can set thresholds for Good, Acceptable, and Unacceptable scores.

11. Click Next.

12. In the Score using: Values panel, set up the metric weight for each metric. You can double-click the defaultmetric weight of 1 to change the value.

13. In the Metric Group Thresholds panel, set up metric group thresholds.

14. Click Save to save the scorecard or click Save & Run to save and run the scorecard.

Running a ScorecardRun a scorecard to generate scores for columns.

1. In the Navigator, select the project or folder that contains the scorecard.

2. Click the scorecard to open it.

The scorecard appears in a tab.

3. Click Actions > Run Scorecard.

4. Select a score from the Metrics panel and select the columns from the Columns panel to drill down on.

5. In the Drilldown option, choose to drill down on live data or staged data.

For optimal performance, drill down on live data.

6. Click Run.

Viewing a ScorecardRun a scorecard to see the scores for each metric. A scorecard displays the score as a percentage and bar. Viewdata that is valid or not valid. You can also view scorecard information, such as the metric weight, metric groupscore, score trend, and name of the data object.

1. Run a scorecard to view the scores.

2. Select a metric that contains the score you want to view.

3. Click Actions > Drilldown to view the rows of valid data or rows of data that is not valid for the column.

The Analyst tool displays the rows of valid data by default in the Drilldown panel.

58 Chapter 12: Scorecards in Informatica Analyst

Page 70: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Editing a ScorecardEdit valid values for metrics in a scorecard. You must run a scorecard before you can edit it.

1. In the Navigator, select the project or folder that contains the scorecard.

2. Click the scorecard to open it.

The scorecard appears in a tab.

3. Click Actions > Edit.

The Edit Scorecard dialog box appears.

4. On the Metrics tab, select each score in the Metrics panel and configure the valid values from the list of allvalues in the Score using: Values panel.

5. Make changes to the score thresholds in the Metric Thresholds panel as necessary.

6. Click the Metric Groups tab.

7. Create, edit, or remove metric groups.

You can also edit the metric weights and metric thresholds on the Metric Groups tab.

8. Click the Notifications tab.

9. Make changes to the scorecard notification settings as necessary.

You can set up global and custom settings for metrics and metric groups.

10. Click Save to save changes to the scorecard, or click Save & Run to save the changes and run thescorecard.

Defining ThresholdsYou can set thresholds for each score in a scorecard. A threshold specifies the range in percentage of bad datathat is acceptable for columns in a record. You can set thresholds for good, acceptable, or unacceptable ranges ofdata. You can define thresholds for each column when you add columns to a scorecard, or when you edit ascorecard.

Complete the following prerequisite tasks before you define thresholds for columns in a scorecard:

¨ In the Navigator, select the project or folder that contains the profile and add columns from the profile to thescorecard in the Add to Scorecard window.

¨ Optionally, in the Navigator, select the project or folder that contains the scorecard and click the scorecard toedit it in the Edit Scorecard window.

1. In the Add to Scorecard window, or the Edit Scorecard window, select each metric in the Metrics panel.

2. In the Metric Thresholds panel, enter the thresholds that represent the upper bound of the unacceptablerange and the lower bound of the good range.

3. Click Next or Save.

Metric GroupsCreate a metric group to categorize related scores in a scorecard into a set. By default, the Analyst toolcategorizes all the scores in a default metric group.

After you create a metric group, you can move scores out of the default metric group to another metric group. Youcan edit a metric group to change its name and description, including the default metric group. You can deletemetric groups that you no longer use. You cannot delete the default metric group.

Metrics 59

Page 71: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Creating a Metric GroupCreate a metric group to add related scores in the scorecard to the group.

1. In the Navigator, select the project or folder that contains the scorecard.

2. Click the scorecard to open it.

The scorecard appears in a tab.

3. Click Actions > Edit.

The Edit Scorecard window appears.

4. Click the Metric Groups tab.

The default group appears in the Metric Groups panel and the scores in the default group appear in theMetrics panel.

5. Click the New Group icon to create a metric group.

The Metric Groups dialog box appears.

6. Enter a name and optional description.

7. Click OK.

8. Click Save to save the changes to the scorecard.

Moving Scores to a Metric GroupAfter you create a metric group, you can move related scores to the metric group.

1. In the Navigator, select the project or folder that contains the scorecard.

2. Click the scorecard to open it.

The scorecard appears in a tab.

3. Click Actions > Edit.

The Edit Scorecard window appears.

4. Click the Metric Groups tab.

The default group appears in the Metric Groups panel and the scores in the default group appear in theMetrics panel.

5. Select a metric from the Metrics panel and click the Move Metrics icon.

The Move Metrics dialog box appears.

Note: To select multiple scores, hold the Shift key.

6. Select the metric group to move the scores to.

7. Click OK.

Editing a Metric GroupEdit a metric group to change the name and description. You can change the name of the default metric group.

1. In the Navigator, select the project or folder that contains the scorecard.

2. Click the scorecard to open it.

The scorecard opens in a tab.

3. Click Actions > Edit.

The Edit Scorecard window appears.

60 Chapter 12: Scorecards in Informatica Analyst

Page 72: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

4. Click the Metric Groups tab.

The default metric group appears in the Metric Groups panel and the metrics in the default metric groupappear in the Metrics panel.

5. On the Metric Groups panel, click the Edit Group icon.

The Edit dialog box appears.

6. Enter a name and an optional description.

7. Click OK.

Deleting a Metric GroupYou can delete a metric group that is no longer valid. When you delete a metric group, you can choose to movethe scores in the metric group to the default metric group. You cannot delete the default metric group.

1. In the Navigator, select the project or folder that contains the scorecard.

2. Click the scorecard to open it.

The scorecard opens in a tab.

3. Click Actions > Edit.

The Edit Scorecard window appears.

4. Click the Metric Groups tab.

The default metric group appears in the Metric Groups panel and the metrics in the default metric groupappear in the Metrics panel.

5. Select a metric group in the Metric Groups panel, and click the Delete Group icon.

The Delete Groups dialog box appears.

6. Choose the option to delete the metrics in the metric group or the option to move the metrics to the defaultmetric group before deleting the metric group.

7. Click OK.

Drilling Down on ColumnsDrill down on the columns for a score to select columns that appear when you view the valid data rows or datarows that are not valid. The columns you select to drill down on appear in the Drilldown panel.

1. Run a scorecard to view the scores.

2. Select a column that contains the score you want to view.

3. Click Actions > Drilldown to view the rows of valid or invalid data for the column.

4. Click Actions > Drilldown Columns.

The columns appear in the Drilldown panel for the selected score. The Analyst tool displays the rows of validdata for the columns by default. Optionally, click Invalid to view the rows of data that are not valid.

Viewing Trend ChartsYou can view trend charts for each score to monitor how the score changes over time.

1. In the Navigator, select the project or folder that contains the scorecard.

2. Click the scorecard to open it.

Metrics 61

Page 73: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

The scorecard appears in a tab.

3. In the Scorecard view, select a score.

4. Click Actions > Show Trend Chart.

The Trend Chart Detail window appears. You can view score values that have changed over time. TheAnalyst tool uses historical scorecard run data for each date and the latest valid score values to calculate thescore. The Analyst tool uses the latest threshold settings in the chart to depict the color of the score points.

Scorecard NotificationsYou can configure scorecard notification settings so that the Analyst tool sends emails when specific metric scoresor metric group scores move across thresholds or remain in specific score ranges, such as Unacceptable,Acceptable, and Good.

You can configure email notifications for individual metric scores and metric groups. If you use the global settings,the Analyst tool sends notification emails when the scores of selected metrics cross the threshold from the scoreranges Good to Acceptable and Acceptable to Bad. You also get notification emails for each scorecard run if thescore remains in the Unacceptable score range across consecutive scorecard runs.

You can customize the notification settings so that scorecard users get email notifications when the scores movefrom the Unacceptable to Acceptable and Acceptable to Good score ranges. You can also choose to send emailnotifications if a score remains within specific score ranges for every scorecard run.

Notification Email Message TemplateYou can set up the message text and structure of email messages that the Analyst tool sends to recipients as partof scorecard notifications. The email template has an optional introductory text section, read-only message bodysection, and optional closing text section.

The following table describes the tags in the email template:

Tag Description

ScorecardName Name of the scorecard.

ObjectURL A hyperlink to the scorecard. You need to provide the username and password.

MetricGroupName Name of the metric group that the metric belongs to.

CurrentWeightedAverage Weighted average value for the metric group in the current scorecard run.

CurrentRange The score range, such as Unacceptable, Acceptable, and Good, for the metricgroup in the current scorecard run.

PreviousWeightedAverage Weighted average value for the metric group in the previous scorecard run.

PreviousRange The score range, such as Unacceptable, Acceptable, and Good, for the metricgroup in the previous scorecard run.

ColumnName Name of the source column that the metric is assigned to.

62 Chapter 12: Scorecards in Informatica Analyst

Page 74: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Tag Description

ColumnType Type of the source column.

RuleName Name of the rule.

RuleType Type of the rule.

DataObjectName Name of the source data object.

Setting Up Scorecard NotificationsYou can set up scorecard notifications at both metric and metric group levels. Global notification settings apply tothose metrics and metric groups that do not have individual notification settings.

1. Run a scorecard in the Analyst tool.

2. Click Actions > Edit.

3. Click the Notifications tab.

4. Select Enable notifications to start configuring scorecard notifications.

5. Select a metric or metric group.

6. Click the Notifications check box to enable the global settings for the metric or metric group.

7. Select Use custom settings to change the settings for the metric or metric group.

You can choose to send a notification email when the score is in Unacceptable, Acceptable, and Goodranges and moves across thresholds.

8. To edit the global settings for scorecard notifications, click the Edit Global Settings icon.

The Edit Global Settings dialog box appears where you can edit the settings including the email template.

Configuring Global Settings for Scorecard NotificationsIf you choose the global scorecard notification settings, the Analyst tool sends emails to target users when thescore is in the Unacceptable range or moves down across thresholds. As part of the global settings, you canconfigure the email template including the email addresses and message text for a scorecard.

1. Run a scorecard in the Analyst tool.

2. Click Actions > Edit to open the Edit Scorecard dialog box.

3. Click the Notifications tab.

4. Select Enable notifications to start configuring scorecard notifications.

5. Click the Edit Global Settings icon.

The Edit Global Settings dialog box appears where you can edit the settings, including the email template.

6. Choose when you want to send email notifications using the Score in and Score moves check boxes.

7. In the Email from field, change the email ID as necessary.

By default, the Analyst tool uses the Sender Email Address property of the Data Integration Service as thesender email ID.

8. In the Email to field, enter the email ID of the recipient.

Use a semicolon to separate multiple email IDs.

Scorecard Notifications 63

Page 75: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

9. Enter the text for the email subject.

10. In the Body field, add the introductory and closing text of the email message.

11. To apply the global settings, select Apply settings to all metrics and metric groups.

12. Click OK.

Scorecard Integration with External ApplicationsYou can create a scorecard in the Analyst tool and view its results in external applications or web portals. Specifythe scorecard results URL in a format that includes the host name, port number, project ID, and scorecard ID toview the results in external applications.

Open a scorecard after you run it and copy its URL from the browser. The scorecard URL must be in the followingformat:

http://{HOST_NAME}:{PORT}/AnalystTool/com.informatica.at.AnalystTool/index.jsp?mode=scorecard&project={MRS_PROJECT_ID}&id={SCORECARD_ID}&parentpath={MRS_PARENT_PATH}&view={VIEW_MODE}&pcsfcred={CREDENTIAL}

The following table describes the scorecard URL attributes:

Attribute Description

HOST_NAME Host name of the Analyst Service.

PORT Port number for the Analyst Service.

MRS_PROJECT_ID Project ID in the Model repository.

SCORECARD_ID ID of the scorecard.

MRS_PARENT_PATH Location of the scorecard in the Analyst tool. For example, /project1/folder1/sub_folder1.

VIEW_MODE Determines whether a read-only or editable view of thescorecard gets integrated with the external application.

CREDENTIAL Last part of the URL generated by the single sign-on featurethat represents the object type such as scorecard.

The VIEW_MODE attribute in the scorecard URL determines whether you can integrate a read-only or editableview of the scorecard with the external application:view=objectonly

Displays a read-only view of the scorecard results.

view=objectrunonly

Displays scorecard results where you can run the scorecard and drill down on results.

view=full

Opens the scorecard results in the Analyst tool with full access.

64 Chapter 12: Scorecards in Informatica Analyst

Page 76: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Viewing a Scorecard in External ApplicationsYou view a scorecard using the scorecard URL in external applications or web portals. Copy the scorecard URLfrom the Analyst tool and add it to the source code of external applications or web portals.

1. Run a scorecard in the Analyst tool.

2. Copy the scorecard URL from the browser.

3. Verify that the URL matches the http://{HOST_NAME}:{PORT}/AnalystTool/com.informatica.at.AnalystTool/index.jsp?mode=scorecard&project={MRS_PROJECT_ID}&id={SCORECARD_ID}&parentpath={MRS_PARENT_PATH}&view={VIEW_MODE}&pcsfcred={CREDENTIAL} format.

4. Add the URL to the source code of the external application or web portal.

Scorecard Integration with External Applications 65

Page 77: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 1 3

Exception Record ManagementThis chapter includes the following topics:

¨ Exception Record Management Overview, 66

¨ Exception Management Tasks, 68

Exception Record Management OverviewAn exception is a record that contains unresolved data quality issues. The record may contain errors, or it may bean unintended duplicate of another record. You can use the Analyst tool to review and edit exception records thatare identified by a mapping that contains an Exception transformation.

You can review and edit the output from an Exception transformation in the Analyst tool or in the Informatica DataDirector for Data Quality web application. You use Informatica Data Director for Data Quality when you areassigned a task as part of a workflow.

You can use the Analyst tool to review the following exception types:

Bad records

You can edit records, delete records, tag them to be reprocessed by a mapping, or profile them to analyze thequality of changes made to the records.

Duplicate records

You can consolidate clusters of similar records to a single master record. You can consolidate or removeduplicate records, extract records to form new clusters, and profile duplicate records.

The Exception transformation creates a database table to store the bad or duplicate records. The Model repositorystores the data object associated with the table. The transformation also creates one or more tables for themetadata associated with the bad or duplicate records.

To review and update the bad or duplicate records, import the database table to the staging database in theAnalyst tool. The Analyst tool uses the metadata tables in the database to identify the data quality issues in eachrecord. You do not use the data object in the Model repository to update the record data.

Exception Management Process FlowThe Exception transformation analyzes the output of other data quality transformations and creates tables thatcontain records with different levels of data quality.

After the Exception transformation creates an exception table, you can use the Analyst tool or Informatica DataDirector for Data Quality to review and update the records in the table.

66

Page 78: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

You can configure data quality transformations in a single mapping, or you can create mappings for differentstages in the process.

Use the Developer tool to perform the following tasks:

Create a mapping that generates score values for data quality issues

Use a Match transformation in cluster mode to generate score values for duplicate record exceptions.

Use a transformation that writes a business rule to generate score values for records that contain errors. Forexample, you can define an IF/THEN rule in a Decision transformation. Use the rule to evaluate the output ofother data quality transformations.

Use an Exception transformation to analyze the record scores

Configure the Exception transformation to read the output of other transformations or to read a data objectfrom another mapping. Configure the transformation to write records to database tables based on scorevalues in the records.

Configure target data objects for good records or automatic consolidation records

Connect the Exception transformation output ports to the target data objects in the mapping.

Create the target data object for bad or duplicate records

Use the Generate bad records table or Generate duplicate record table option to create the databaseobject and add it to the mapping canvas. The Developer tool auto-connects the bad or duplicate record portsto the data object.

Run the mapping

Run the mapping to process exceptions.

Use the Analyst tool or Informatica Data Director for Data Quality to perform the following tasks:

Review the exception table data

You can use the Analyst tool or Informatica Data Director for Data Quality to review the bad or duplicaterecord tables.

¨ Use the Analyst tool to import the exception records into a bad or duplicate record table. Open theimported table from the Model repository and work on the exception data.

¨ Use Informatica Data Director for Data Quality if you are assigned a task to review or correct exceptionsas part of a Human task.

Note: The exception tables you create in the Exception transformation include columns that provide metadatato Informatica Data Director for Data Quality. The columns are not used in the Analyst tool. When you importthe tables to the Analyst tool for exception data management, the Analyst tool hides the columns.

Reserved Column NamesWhen you create a bad record or consolidation table, the Analyst tool generates columns for use in its internaltables. Do not import tables that use these names. If an imported table contains a column with the same name asone of the generated columns, the Analyst tool will not process it.

Reserve the following column names for bad record or consolidation tables:

¨ checkStatus

¨ rowIdentifier

¨ acceptChanges

¨ recordGroup

¨ masterRecord

Exception Record Management Overview 67

Page 79: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

¨ matchScore

¨ any name beginning with DQA_

Exception Management TasksYou can perform the following exception management tasks in the Analyst tool:

Manage bad records

Identify problem records and fix data quality issues.

Consolidate duplicate records

Merge groups of duplicate records into a single record.

View the audit trail

Review the changes made in the bad or duplicate record tables before writing the changes to the sourcedatabase.

Viewing and Editing Bad RecordsComplete these steps to view and edit bad records:

1. Log in to the Analyst tool.

2. Select a project.

3. Select a bad records table.

4. Optionally, use the menus to filter the table records. You can filter records by value in the following columns:

Priority, Quality Issue, Column, and Status.

5. Click Show to view the records that match the filter criteria.

6. Double-click a cell to edit the cell to edit the cell value.

7. Click Save to save the rows you updated.

Saving changes to a record is the first step in processing the record in the Analyst tool. After you save changes toa record, you can update the record status to accept, reprocess, or reject the record.

Updating Bad Record StatusFor each record that does not require further editing, perform one of the following actions:

Select one or more records by clicking the check box next to each record. Select all the records in the table byclicking the check box at the top of the first column.

Note: The Analyst tool does not display records that you have taken action on.

¨ Click Accept.

Indicates that the record is acceptable for use.

¨ Click Reject.

Indicates that the record is not acceptable for use.

¨ Click Reprocess.

Selects the record for reprocessing by a data quality mapping. Select this option when you are unsure if therecord is valid. Rerun the mapping with an updated business rule to recheck the record.

68 Chapter 13: Exception Record Management

Page 80: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Viewing and Filtering Duplicate Record ClustersComplete these steps to view and filter duplicate clusters:

1. Log in to the Analyst tool.

2. Select a project.

3. Select a duplicate record table.

4. The first cluster in the table opens.

The Analyst tool also displays the number of clusters in the table. Click a number to move to a cluster.

5. Optionally, use the Filter option to filter the cluster list.

In the Filter Clusters dialog box, select a column and enter a filter string. The Analyst tool returns all clusterswith one or more records that contain the string in the column you select.

Editing Duplicate Record ClustersEdit clusters to change how the Analyst tool consolidates potential duplicate records.

You can edit clusters in the following ways:To remove a record from a cluster:

Clear the selection in the Cluster column to remove the record from the cluster. When you delete a recordfrom a cluster, the record assumes a unique cluster ID.

To create a new cluster from records in the current cluster:

Select a subset of records and click the Extract Cluster button. This action creates a new cluster ID for theselected records.

To edit the record:

Select a record field to edit the data in that field.

To select the fields that populate the master record:

Click the selection arrow in a field to add its value to the corresponding field in the Final Record row. Anarrow indicates that the field provides data for the master record.

To specify a master record:

Click a cell in the Master column for a row to select that row as the master record.

Consolidating Duplicate Record ClustersWhen you have processed a cluster, complete this step to consolidate the cluster records to a single record in thestaging database.

u In the cluster you processed, click the Consolidate Cluster button.

The Analyst tool performs the following updates on cluster records:

¨ In the staging database, the Analyst tool updates the master record with the contents of the Final record andsets the status to Updated.

¨ The Analyst tool sets the status of the other selected records to Consolidated.

¨ The Analyst tool sets the status of any cleared record to Reprocess.

Exception Management Tasks 69

Page 81: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Viewing the Audit TrailThe Analyst tool tracks changes to the exception record database in an audit trail.

Complete the following steps to view audit trail records:

1. Select the Audit Trail tab.

2. Set the filter options.

3. Click Show.

The following table describes record statuses for the audit trail.

Record Status Description

Updated Edited during bad record processing, or selected as theMaster record during consolidation.

Consolidated Consolidated to a master record during consolidation.

Rejected Rejected during bad record processing.

Accepted Accepted during bad record processing.

Reprocess Marked for reprocessing during bad record processing.

Rematch Removed from a cluster during consolidation.

Extracted Extracted from a cluster into a new cluster duringconsolidation.

70 Chapter 13: Exception Record Management

Page 82: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

C H A P T E R 1 4

Reference TablesThis chapter includes the following topics:

¨ Reference Tables Overview, 71

¨ Reference Table Properties, 71

¨ Create Reference Tables, 73

¨ Create a Reference Table from Profile Data, 74

¨ Create a Reference Table From a Flat File, 76

¨ Create a Reference Table from a Database Table, 78

¨ Copying a Reference Table in the Model Repository, 79

¨ Reference Table Management, 79

¨ Audit Trail Events, 81

¨ Rules and Guidelines for Reference Tables, 82

Reference Tables OverviewInformatica provides reference tables that you can import to the Model repository. You can also create referencetables and connect to database tables that contain reference data.

Use the Analyst tool to create and update reference tables.

Reference Table PropertiesYou can view and edit the properties of a reference table in the Analyst tool.

To view the properties, open the reference table and select the Properties view.

To edit the properties, open the reference table and select the Edit Table option.

A reference table displays general properties that describe the repository object and column properties thatdescribe the column data.

71

Page 83: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

General Reference Table PropertiesThe general properties include information about the users who created and updated the reference table. Thegeneral properties also identify the current valid column in the table.

The following table describes the general properties:

Property Description

Name Name of the reference table.

Description Optional description of the reference table.

Location Project that contains the reference table in the Modelrepository.

Precision Precision for the column. Precision is the maximum number ofdigits or the maximum number of characters that the columncan accommodate.

Valid Column Column that contains the valid reference data.

Created on Creation date for the reference table.

Created By User who created the reference table.

Last Modified Date of the most recent update to the reference table.

Last Modified User who most recently edited the reference table.

Connection ID Connection name of the database that stores the referencetable data.

Reference Table Column PropertiesThe column properties include information about the column metadata.

The following table describes the column properties:

Property Description

Name Name of each column.

Data Type The datatype for the data in each column. You can select oneof the following datatypes:- bigint- date/time- decimal- double- integer- stringYou cannot select a double data type when you create anempty reference table or create a reference table from a flatfile.

72 Chapter 14: Reference Tables

Page 84: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Property Description

Precision Precision for each column. Precision is the maximum numberof digits or the maximum number of characters that thecolumn can accommodate.The precision values you configure depend on the data type.

Scale Scale for each column. Scale is the maximum number ofdigits that a column can accommodate to the right of thedecimal point. Applies to decimal columns.The scale values you configure depend on the data type.

Description Optional description for each column.

Create Reference TablesUse the reference table editor, profile results, or a flat file to create reference tables. Create reference tables toshare reference data with developers in the Developer tool.

Use the following methods to create a reference table:

¨ Create a reference table in the reference table editor.

¨ Create a reference table from profile column data or profile pattern data.

¨ Create a reference table from flat file data.

¨ Create a reference table from data in another database table.

Creating a Reference Table in the Reference Table EditorUse the New Reference Table Wizard and the reference table editor view to create a reference table. You use thereference table editor to define the table structure and add data to the table.

1. In the Navigator, select the project or folder where you want to create the reference table.

2. Click Actions > New > Reference Table.

The New Reference Table Wizard appears.

3. Select the option to Use the reference table editor.

4. Click Next.

5. Enter the table name, and optionally enter a description and default value.

The Analyst tool uses the default value for any table record that does not contain a value.

6. For each column you want to include in the reference table, click the Add New Column icon and configurethe properties for each column.

Note: You can reorder or delete columns.

7. Optionally, enter an audit note for the table.

The audit note appears in the audit trail log.

8. Click Finish.

Create Reference Tables 73

Page 85: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Create a Reference Table from Profile DataYou can use profile data to create reference tables that relate to the source data in the profile. Use the referencetables to find different types of information in the source data.

You can use a profile to create or update a reference table in the following ways:

¨ Select a column in the profile and add it to a reference table.

¨ Browse a profile column and add a subset of the column data to a reference table.

¨ Select a column in the profile and add the pattern values for that column to a reference table.

Creating a Reference Table from Profile ColumnsYou can create a reference table from a profile column. You can add a profile column to an existing referencetable. The New Reference Table Wizard adds the column to the reference table.

1. In the Navigator, select the project or folder that contains the profile with the column that you want to add to areference table.

2. Click the profile name to open it in another tab.

3. In the Column Profiling view, select the column that you want to add to a reference table.

4. Click Actions > Add to Reference Table.

The New Reference Table Wizard appears.

5. Select the option to Create a new reference table.

Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in theproject or folder, preview the reference table data and click Next. Select the column to add and click Finish.

6. Click Next.

7. The column name appears by default as the table name. Optionally enter another table name, a description,and default value.

The Analyst tool uses the default value for any table record that does not contain a value.

8. Click Next.

9. In the Column Attributes panel, configure the column properties for the column.

10. Optionally, choose to create a description column for rows in the reference table.

Enter the name and precision for the column.

11. Preview the column values in the Preview panel.

12. Click Next.

13. The column name appears as the table name by default. Optionally, enter another table name and adescription.

14. In the Save in panel, select the location where you want to create the reference table.

The Reference Tables: panel lists the reference tables in the location you select.

15. Optionally, enter an audit note.

16. Click Finish.

Creating a Reference Table from Column ValuesYou can create a reference table from the column values in a profile column. Select a column in a profile andselect the column values to add to a reference table or create a reference table to add the column values.

74 Chapter 14: Reference Tables

Page 86: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

1. In the Navigator, select the project or folder that contains the profile with the column that you want to add to areference table.

2. Click the profile name to open it in another tab.

3. In the Column Profiling view, select the column that you want to add to a reference table.

4. In the Values view, select the column values you want to add. Use the CONTROL or SHIFT keys to selectmultiple values.

5. Click Actions > Add to Reference Table.

The New Reference Table Wizard appears.

6. Select the option to Create a new reference table.

Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in theproject or folder, preview the reference table data and click Next. Select the column to add and click Finish.

7. Click Next.

8. The column name appears by default as the table name. Optionally enter another table name, a description,and default value.

The Analyst tool uses the default value for any table record that does not contain a value.

9. Click Next.

10. In the Column Attributes panel, configure the column properties for the column.

11. Optionally, choose to create a description column for rows in the reference table.

Enter the name and precision for the column.

12. Preview the column values in the Preview panel.

13. Click Next.

14. The column name appears as the table name by default. Optionally, enter another table name and adescription.

15. In the Save in panel, select the location where you want to create the reference table.

The Reference Tables: panel lists the reference tables in the location you select.

16. Optionally, enter an audit note.

17. Click Finish.

Creating a Reference Table from Column PatternsYou can create a reference table from the column patterns in a profile column. Select a column in the profile andselect the pattern values to add to a reference table or create a reference table to add the pattern values.

1. In the Navigator, select the project or folder that contains the profile with the column that you want to add to areference table.

2. Click the profile name to open it in another tab.

3. In the Column Profiling view, select the column that you want to add to a reference table.

4. In the Patterns view, select the column patterns you want to add. Use the CONTROL or SHIFT keys to selectmultiple values

5. Click Actions > Add to Reference Table.

The New Reference Table Wizard appears.

6. Select the option to Create a new reference table.

Create a Reference Table from Profile Data 75

Page 87: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in theproject or folder, preview the reference table data and click Next. Select the column to add and click Finish.

7. Click Next.

8. The column name appears by default as the table name. Optionally enter another table name, a description,and default value.

The Analyst tool uses the default value for any table record that does not contain a value.

9. Click Next.

10. In the Column Attributes panel, configure the column properties for the column.

11. Optionally, choose to create a description column for rows in the reference table.

Enter the name and precision for the column.

12. Preview the column values in the Preview panel.

13. Click Next.

14. The column name appears as the table name by default. Optionally, enter another table name and adescription.

15. In the Save in panel, select the location where you want to create the reference table.

The Reference Tables: panel lists the reference tables in the location you select.

16. Optionally, enter an audit note.

17. Click Finish

Create a Reference Table From a Flat FileYou can import reference data from a CSV file. Use the New Reference Table wizard to import the file data.

You must configure the properties for each flat file that you use to create a reference table.

Analyst Tool Flat File PropertiesWhen you import a flat file as a reference table, you must configure the properties for each column in the file. Theoptions that you configure determine how the Analyst tool reads the data from the file.

The following table describes the properties you can configure when you import file data for a reference table:

Properties Description

Delimiters Character used to separate columns of data. Use the Otherfield to enter a different delimiter.Delimiters must be printable characters and must be differentfrom the escape character and the quote character if selected.You cannot select non-printing multibyte characters asdelimiters.

Text Qualifier Quote character that defines the boundaries of text strings.Choose No Quote, Single Quote, or Double Quotes.If you select a quote character, the wizard ignores delimiterswithin pairs of quotes.

76 Chapter 14: Reference Tables

Page 88: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Properties Description

Column Names Imports column names from the first line. Select this option ifcolumn names appear in the first row.The wizard uses data in the first row in the preview for columnnames.Default is not enabled.

Values Option to start value import from a line. Indicates the rownumber in the preview at which the wizard starts readingwhen it imports the file.

Creating a Reference Table from a Flat FileWhen you create a reference table data from a flat file, the table uses the column structure of the file and importsthe file data.

1. In the Navigator, select the project or folder where you want to create the reference table.

2. Click Actions > New > Reference Table.

The New Reference Table Wizard appears.

3. Select the option to Import a flat file.

4. Click Next.

5. Click Browse to select the flat file.

6. Click Upload to upload the file to a directory in the Informatica services installation directory that the Analysttool can access.

7. Enter the table name. Optionally, enter a description and default value.

The Analyst tool uses the default value for any table record that does not contain a value.

8. Select a code page that matches the data in the flat file.

9. Preview the data in the Preview of file panel.

10. Click Next.

11. Configure the flat file properties.

12. In the Preview panel, click Show to update the preview.

13. Click Next.

14. On the Column Attributes panel, verify or edit the column properties for each column.

15. Optionally, create a description column for rows in the reference table. Enter the name and precision for thecolumn.

16. Optionally, enter an audit note for the table.

17. Click Finish.

Create a Reference Table From a Flat File 77

Page 89: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Create a Reference Table from a Database TableWhen you create a reference table from a database table, you connect to the database and import the table data.

Use the New Reference Table wizard to enter the database connection properties for the table. Then import thetables into a folder into the Model repository.

Creating a Database ConnectionBefore you import reference tables from a database, you create a database connection in the Analyst tool.

1. Select a project or folder in the Navigator.

2. Click Actions > New > Reference Table.

The New Reference Table Wizard appears.

3. Select the option to Connect to a relational table.

Optionally, select the option to create an unmanaged reference table. If you select this option, the Analyst tooldoes not store the reference table data in the reference data database.

4. Click Next.

5. Click New Connection.

The New Connection window appears.

6. Enter the properties for the database you want to connect to.

7. Select Grant everyone execute permission on this connection.

8. Click OK.

The Analyst tool tests the database connection. The database connection appears in the list of establishedconnections.

Creating a Reference Table from a Database TableTo create the reference table, connect to a database and import the column data you need.

1. In the Navigator, select the project or folder where you want to create the reference table.

2. Click Actions > New > Reference Table.

The New Reference Table Wizard appears.

3. Select the option to Connect to a relational table.

4. Select Unmanaged Table if you want to create a table that does not store data in the reference datadatabase. You cannot edit the values in an unmanaged reference table.

5. Click Next.

6. Select the database connection from the list of established connections.

7. Click Next.

8. On the Tables panel, select a table.

The table properties appear on the Properties panel.

9. Optionally, click Data Preview.

10. Click Next.

11. On the Column Attributes panel, configure the column properties for each column.

12. Optionally, include a column for row-level descriptions.

78 Chapter 14: Reference Tables

Page 90: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

13. Optionally, add an audit note in the Audit Note field.

14. Click Next.

15. Enter a name and optionally a description for the reference table.

16. On the Folders panel, select the project or folder where you want to create the reference table.

17. The Reference Tables panel lists the reference tables in the folder you select.

18. Click Finish.

Copying a Reference Table in the Model RepositoryYou can copy a reference table between folders in a Model repository project.

The reference table and the copy you create are not linked in the Model repository or in the database. When youcreate a copy, you create a new database table.

1. Browse the Model repository, and find the reference table you want to copy.

2. Right-click the reference table, and select Duplicate from the context menu.

3. In the Duplicate dialog box, select a folder to store the copy of the reference table.

4. Optionally, enter a new name for the copy of the reference table.

5. Click OK.

Reference Table ManagementYou can perform tasks to manage reference tables. You can find and replace column values, add or removecolumns and rows, edit column values, and export a reference table to a file.

You can perform the following tasks to manage reference tables:

¨ Manage columns. Use the Edit column properties window to add, edit, or delete columns in a referencetable.

¨ Manage rows. Use the Add Rows window to add rows and the Edit Row window to edit rows in a referencetable. Use the Delete icon to delete rows in a reference table.

¨ Find and replace values. You can find and replace values in individual reference table columns. You can finda value in a column and replace it with another value. You can replace all values in columns with another value.

¨ Export a reference table. Export a reference table to a comma-separated values (CSV) file, dictionary file, orExcel file.

Managing ColumnsUse the Edit column properties window to add, edit, or delete columns in a reference table.

1. In the Navigator, select the project or folder that contains the reference table that you want to edit.

2. Click the reference table name to open it in a tab. The Reference Table tab appears.

3. Click Actions > Edit Table or click the Edit Table icon.

Copying a Reference Table in the Model Repository 79

Page 91: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

The Edit column properties window appears.

4. To add a column, click the Add New Column icon in the Column Attributes panel and edit the columnproperties. Or, to edit an existing column, click the property you want to edit.

You cannot edit the datatype, precision, and scale of the column. You can rename the column and change thecolumn description.

5. To delete a column, click the column and click the Delete icon.

6. Optionally, you can enter an audit note on the Audit Note panel. The audit note appears in the audit log forany action you perform in the Edit column properties window.

7. Click OK.

Managing RowsYou can add, edit, or delete rows in a reference table.

1. In the Navigator, select the project or folder containing the reference table that you want to edit.

2. Click the reference table name to open it in a tab. The Reference Table tab appears.

3. To add a row, click Actions > Add Row or click the Add Row icon. In the Add Row window, enter the valuefor each column and enter an optional audit note. Click OK.

4. To edit rows, select the rows and click Actions > Edit or click the Edit icon. In the Edit Rows window, enterthe value for each column, select the columns to apply the changes to, and enter an optional audit note.Optionally, click Previous to edit the previous row and click Next to edit the next row. Click Apply to applythe changes.

The new column values appear in the tab.

5. To delete rows, select the rows you want to delete and click Actions > Delete or click the Delete icon. In theDelete Rows window, enter an optional audit note and click OK.

Note: Use the Developer to edit larger reference tables. For example, if the reference table contains more than500 rows or five columns, edit the reference table in the Developer tool.

Finding and Replacing ValuesYou can find and replace values in individual reference table columns.

1. In the Navigator, select the project or folder containing the reference table that you want to find and replacevalues in.

2. Click the reference table name to open it in a tab. The Reference Table tab appears.

3. Click Actions > Find and Replace or click the Find and Replace icon.

The Find and Replace toolbar appears.

4. Enter the search criteria in the Find box. Select all columns or a column that you want to find in the list. Enterthe value you want to replace with, and click one of the following buttons:

Option Description

Next/Previous Scroll through the column values that match the search criteria.

Highlight All Highlight all the column values that match the search criteria.

Replace Replace the currently highlighted column value.

80 Chapter 14: Reference Tables

Page 92: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

Option Description

Replace All Replace all occurrences of the search criteria in column values.

Exporting a Reference TableExport a reference table to a comma-seperated values (CSV) file, dictionary file, or Microsoft Excel file.

1. In the Navigator, select the project or folder containing the reference table that you want to view the audit trailfor.

2. Click the reference table name to open it in a tab. The Reference Table tab appears.

3. Click Actions > Export Data.

The Export data to a file window appears.

4. Configure the following options:

Option Description

File Name File name for the exported data.

File Format Format of the exported file. You can select the following formats:

¨ csv. Comma-separated values file.¨ xls. Microsoft Excel file.¨ dic. Dictionary file.

Optionally, select Export field names as first row to export the column names as a header rowin the exported file.

Code Page Code page of the reference data.

5. Click OK.

The options to save or open the file depend on your browser.

Audit Trail EventsUse the Audit Trail view for a reference table to view audit trail log events.

The Analyst tool creates audit trail log events when you make a change to a reference table and enter an audittrail note. Audit trail log events provide information about the reference tables that you manage.

Audit Trail Events 81

Page 93: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

You can configure query options on the Audit Trail tab to filter the log events that you view. You can specify filterson the date range, type, user name, and status. The following table describes the options you configure when youview audit trail log events:

Option Description

Date Start and end dates for the log events to search for. Use the calender to choose dates.

Type Type of audit trail events. You can filter and view the following events types:- Data. Events related to data in the reference table. Events include creating, editing, deleting,

and replacing all rows.- Metadata. Events related to reference table metadata. Events include creating reference

tables, adding, deleting, and editing columns, and updating valid columns.

User User who edited the reference table and entered the audit trail comment. The Analyst toolgenerates the list of users from the Analyst tool users configured in the Administrator tool.

Status Status of the audit trail log events. Status corresponds to the action performed in the referencetable editor.

Audit trail log events also include the audit trail comments and the column values that were inserted, updated, ordeleted.

Viewing Audit Trail EventsView audit trail log events to get more information about changes made to a reference table.

1. In the Navigator, select the project or folder that contains the reference table that you want to view the audittrail for.

2. Click the reference table name to open it in a tab. The Reference Table tab appears.

3. Click the Audit Trail view.

4. Configure the filter options.

5. Click Show.

The log events for the specified query options appear.

Rules and Guidelines for Reference TablesUse the following rules and guidelines while working with reference tables in the Analyst tool:

¨ When you import a reference table from an Oracle, IBM DB2, IBM DB2/zOS, IBM DB2/iOS, or Microsoft SQLServer database, the Analyst tool cannot display the preview if the table, view, schema, synonym, or columnnames contain mixed case or lower case characters.

To preview data in tables that reside in case-sensitive databases, set the Support Mixed Case Identifiersattribute to true in the connections for Oracle, IBM DB2, IBM DB2/zOS, IBM DB2/iOS, and Microsoft SQLServer databases in the Developer tool or Administrator tool.

¨ When you create a reference table from inferred column patterns in one format, the Analyst tool populates thereference table with column patterns in a different format.

82 Chapter 14: Reference Tables

Page 94: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

For example, when you create a reference table for the column pattern X(5), the Analyst tool displays thefollowing format for the column pattern in the reference table: XXXXX.

¨ When you import an Oracle database table, verify the length of any VARCHAR2 column in the table. TheAnalyst tool cannot import an Oracle database table that contains a VARCHAR2 column with a length greaterthan 1000.

¨ To read a reference table, you need execute permissions on the connection to the database that stores thetable data values. For example, if the reference data database stores the data values, you need executepermissions on the connection to the reference data database. This applies whether you access the referencetable in read or write mode. The database connection permissions apply to all reference data in the database.

Rules and Guidelines for Reference Tables 83

Page 95: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

I N D E X

Ccolumn profile

drilldown 49Informatica Developer 21options 20overview 19process 39

column profile resultsInformatica Developer 23

column propertiesreference tables in Analyst tool 71reference tables in Developer tool 32

creating a custom profileprofiles 41

creating a reference table from column patternsreference tables 75

creating a reference table from column valuesreference tables 75

creating a reference table from profile columnsreference tables 74

creating a reference table manuallyreference tables 73

creating an expression rulerules 54

Ddata object profiles

creating a single profile 22

Eexporting a reference table

reference tables 81expression rules

process 54

Ffinding and replacing valyes

reference tables 80flat file properties

reference tables in Analyst tool 71reference tables in Developer tool 32

flat filessynchronizing a flat file data object 43

Iimporting a reference table

reference tables 77

Informatica Analystcolumn profile results 45column profiles overview 38rules 52

Informatica Data Qualityoverview 2

Informatica Developerrules 26

Mmanaging columns

reference tables 79managing rows

reference tables 80mapping object

running a profile 30Mapplet and Mapping Profiling

Overview 30

Ppredefined rules

process 53profile results

column patterns 47column statistics 48column values 47drilling down 49Excel 50exporting 50exporting from Informatica Analyst 50exporting in Informatica Developer 25summary 46

profilescreating a custom profile 41running 42

Rreference tables

column properties in Analyst tool 71column properties in Developer tool 32creating a reference table from column patterns 75creating a reference table from column values 75creating a reference table from profile columns 74creating a reference table manually 73exporting a reference table 81finding and replacing values 80flat file properties in Analyst tool 71flat file properties in Developer tool 32importing a reference table 77managed and unmanaged 7

84

Page 96: Informatica Data Quality - 9.5.1 - User Guide - (English) Documentation/2/DQ_9… · PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica

managing columns 79managing rows 80viewing audit trail tables 82

rulesapplying a predefined rule 53applying in Informatica Developer 27creating an expression rule 54creating in Informatica Developer 26expression 54overview 20predefined 53

Sscorecard

configuring global notification settings 63configuring notifications 63viewing in external applications 65

scorecard integrationInformatica Analyst 64

scorecardsadding columns to a scoredard 57creating a metric group 60defining thresholds 59deleting a metric group 61drilling down 61editing 59

editing a metric group 60Informatica Analyst 56Informatica Analyst process 56Informatica Developer 28metric groups 59metric weights 57metrics 57moving scores 60notifications 62overview 20running 58viewing 58

Ttables

synchronizing a relational data object 44trend charts

viewing 61

Vviewing audit table events

reference tables 82

Index 85