iWay Data Quality Center User's Guide

Embed Size (px)

Citation preview

  • 8/18/2019 iWay Data Quality Center User's Guide

    1/182

    iWay Data Quality Center User's 

    GuideVersion 6.0.1 Service Manager (SM)

    iWay

    DN3501942.0709

  • 8/18/2019 iWay Data Quality Center User's Guide

    2/182

    Cactus, EDA, EDA/SQL, FIDEL, FOCUS, Information Builders, the Information Builders logo, iWay, iWay Software,Parlay, PC/FOCUS, RStat, TableTalk, Web390, and WebFOCUS are registered trademarks, and Magnify is a trademarof Information Builders, Inc.

    Due to the nature of this material, this document refers to numerous hardware and software products by theirtrademarks. In most, if not all cases, these designations are claimed as trademarks or registered trademarks by therespective companies. It is not this publisher’s intent to use any of these names generically. The reader is thereforcautioned to investigate all claimed trademark rights before using any of these names other than to refer to theproduct described.

    Copyright  ©  2009, by Information Builders, Inc. and iWay Software. All rights reserved. Patent Pending. This manuaor parts thereof, may not be reproduced in any form without the written permission of Information Builders, Inc.

  • 8/18/2019 iWay Data Quality Center User's Guide

    3/182

    iWay

    Contents

    Preface................................................................................................................9

    Documentation Conventions............................................................................................1

    Related Publications........................................................................................................1

    Customer Support...........................................................................................................1

    Help Us to Serve You Better.............................................................................................1

    User Feedback................................................................................................................1

    iWay Software Training and Professional Services..............................................................1

    1. Introducing iWay Data Quality Center...........................................................17About iWay Data Quality Center........................................................................................18

    Managing Data Quality.....................................................................................................18

    Unifying Records.............................................................................................................1

    Supplied Modules...........................................................................................................1

    Summary of Other Product Features.................................................................................20

    2. System Requirements and Installation.........................................................23

    System Requirements.....................................................................................................24

    Installation Procedure......................................................................................................2Installing Database Connectivity Drivers............................................................................2

    License Key....................................................................................................................2

    3. Getting Started..............................................................................................27

    Creating a New Project....................................................................................................28

    Plan File Basics..............................................................................................................28

    Using Input Files.............................................................................................................28

    Running and Debugging a Plan.........................................................................................2

    Connecting to a Database................................................................................................2

    4. Configuring Services ....................................................................................31

    XDDQAgent.....................................................................................................................3

    XDDQCBatchExecAgent....................................................................................................3

    iWay Data Quality Center User's Guide

  • 8/18/2019 iWay Data Quality Center User's Guide

    4/182

  • 8/18/2019 iWay Data Quality Center User's Guide

    5/182

    Date Functions........................................................................................................6

    String Functions......................................................................................................63

    Bitwise Functions....................................................................................................7

    MinMax Functions...................................................................................................76

    Aggregate Functions................................................................................................7

    Conditional Expressions...........................................................................................8

    Conversion and Formatting Functions........................................................................83

    Word Set Operation Functions..................................................................................8

    Unclassified Functions.............................................................................................89

    Regular Expressions........................................................................................................9

    @" Syntax (Single Escaping).....................................................................................9

    Capturing Groups.....................................................................................................9

    8. Unifying Records............................................................................................93

    Candidate Groups...........................................................................................................9

    Basic Method: SimpleKey.........................................................................................94

    Symmetric Merging Method: Union............................................................................94

    Hierarchical Merging Method: Hierarchical / ClassicHierarchical..................................9

    Hierarchical With Union Merging Method: HierarchicalUnion........................................9

    Creating Client Groups.....................................................................................................9

    Unification Roles.............................................................................................................9

    Manual Override..............................................................................................................9Group ID Stability............................................................................................................9

    9. Running iWay DQC in Command Line Mode.................................................101

    Scripts for Command Line Mode....................................................................................102

    Return Codes................................................................................................................10

    10. Configuring Run-Time Variables.................................................................105

    Introduction..................................................................................................................106

    Data Sources................................................................................................................10

    Folder Shortcuts............................................................................................................10

    Run-Time Components...................................................................................................10

    11. Using Online Services................................................................................109

    Online Server Configuration............................................................................................11

    iWay Data Quality Center User's Guide

    Contents

  • 8/18/2019 iWay Data Quality Center User's Guide

    6/182

    Server Configuration Components..................................................................................11

    SecuredWebAccess Component..............................................................................11

    HttpDispatcher Component....................................................................................11

    OnlineServices Component.....................................................................................113

    OnlineServices Component Configuration........................................................................11

    ServiceReference Element......................................................................................11

    Input and Output Methods......................................................................................11

    HttpInputMethod/HttpOutputMethod.......................................................................11

    Input and Output Formats..............................................................................................11

    CSV Format...........................................................................................................118

    XML Format...........................................................................................................11

    SOAP Format.........................................................................................................123

    Multipart Format....................................................................................................12Logging Requests and Responses..................................................................................12

    Example: serviceConfig Configuration.............................................................................12

    Creating a Simple SOAP Web Service..............................................................................12

    Preconditions........................................................................................................128

    Procedures for Creating the Service........................................................................12

    Sample Input Message..........................................................................................134

    Sample Output Message........................................................................................13

    12. Monitoring..................................................................................................137What Is Monitoring?......................................................................................................138

    File Output Format.........................................................................................................138

    Graphical User Interface................................................................................................140

    Batch...................................................................................................................141

    Online Server........................................................................................................14

    Connection...........................................................................................................141

    Connection Options...............................................................................................14

    Filtering................................................................................................................142

    Filtering Options....................................................................................................14

    Refresh.................................................................................................................14

    Snapshots............................................................................................................142

    Drill Down.............................................................................................................14

    6 iWay Softwar

    Contents

  • 8/18/2019 iWay Data Quality Center User's Guide

    7/182

  • 8/18/2019 iWay Data Quality Center User's Guide

    8/182

    8 iWay Softwar

    Contents

  • 8/18/2019 iWay Data Quality Center User's Guide

    9/182

    iWay

    Preface

    This document is written for system integrators and application designers who need toensure data quality control in transactional and analytical applications. It describes how touse iWay Data Quality Center (DQC) in software integration projects to create applicationsfor data quality assurance.

    How This Manual Is Organized

    This manual includes the following chapters:

    ContentsChapter/Appendix

    Provides an overview of iWay Data Quality Center(DQC). It describes the product features used in themanagement of data quality, and the suppliedmodules that enable integration with theinfrastructure at your site. It also summarizesdeployment, operational, and performance featuresof the product.

    Introducing iWay DataQuality Center

    1

    Describes the requirements of the two majorcomponents of iWay DQC. It also describes how toinstall iWay DQC as part of iWay Integration Tools(iIT).

    System Requirements andInstallation

    2

    Describes iWay DQC Manager, which is a design toolfor solving data quality problems.

    Getting Started3

    Describes how to configure the two predefinedservices that you can use as part of your iWay DQC

    projects.

    Configuring Services4

    Describes the supported data types in iWay DQCrecords, I/O operations, and step properties.

    Working With Data Types5

    iWay Data Quality Center User's Guide

  • 8/18/2019 iWay Data Quality Center User's Guide

    10/182

    ContentsChapter/Appendix

    Describes the dictionary files that are created andmaintained in iWay DQC.

    Creating Dictionary Files6

    Describes expressions used in iWay DQC steps.Using Expressions7

    Describes unification, which is identifying groups of records that belong to one logical entity (usually called client), based on a certain set of criteria.

    Unifying Records8

    Describes how to run iWay DQC in command line(batch) mode.

    Running iWay DQC inCommand Line Mode

    9

    Describes how to control certain run-time aspects of 

    iWay DQC by setting variables in the configurationfile.

    Configuring Run-Time

    Variables

    10

    Describes online services, which provideService-Oriented Architecture (SOA) functionality iniWay DQC.

    Using Online Services11

    Describes how to view the progress of an iWay DQCconfiguration that is running, or the state of the onlineserver.

    Monitoring12

    Describes best practices that are used in the

    implementation of iWay DQC. It includes projectdirectory, naming, and scoring conventions.

    Best PracticesA

    Provides the definition for various terms used in thisguide.

    Glossary B

    Documentation Conventions

    The following table lists and describes the conventions that apply in this manual.

    DescriptionConvention

    Denotes syntax that you must enter exactly as shown.THIS TYPEFACE

    or

    this typeface

    10 iWay Softwar

    Documentation Conventions

  • 8/18/2019 iWay Data Quality Center User's Guide

    11/182

    DescriptionConvention

    Represents a placeholder (or variable), a cross-reference, or animportant term. It may also indicate a button, menu item, or dialog

    box option that you can click or select.

    this typeface

    Indicates a default setting.underscore

    Highlights a file name or command.this typeface

    Indicates keys that you must press simultaneously.Key + Key 

    Indicates two or three choices. Type one of them, not the braces.{ }

    Separates mutually exclusive choices in syntax. Type one of them,not the symbol.

    |

    Indicates that you can enter a parameter multiple times. Type only the parameter, not the ellipsis points (...).

    ...

    Indicates that there are (or could be) intervening or additionalcommands.

    .

    .

    .

    Related Publications

    To view a current listing of our publications and to place an order, visit our World Wide Website, http://www.iwaysoftware.com. You can also contact the Publications Order Departmenat (800) 969-4636.

    Customer Support

    Do you have questions about iWay Data Quality Center (DQC)?

    Join the Focal Point community. Focal Point is our online developer center and more than amessage board. It is an interactive network of more than 3,000 developers from almostevery profession and industry, collaborating on solutions and sharing tips and techniques

    Access Focal Point at http://forums.informationbuilders.com/eve/forums.

    iWay Data Quality Center User's Guide 1

    Preface

    http://www.iwaysoftware.com/http://forums.informationbuilders.com/eve/forumshttp://forums.informationbuilders.com/eve/forumshttp://www.iwaysoftware.com/

  • 8/18/2019 iWay Data Quality Center User's Guide

    12/182

    You can also access support services electronically, 24 hours a day, with InfoResponseOnline. InfoResponse Online is accessible through our World Wide Web site,http://techsupport.iwaysoftware.com/ . You can connect to the tracking system and knownproblem database at the Information Builders support center. Registered users can open,update, and view the status of cases in the tracking system and read descriptions of reportesoftware issues. New users can register immediately for this service. The technical supporsection also provides usage techniques, diagnostic tips, and answers to frequently askedquestions.

    Call Information Builders Customer Support Services (CSS) at (800) 736-6130 or (212) 7366130. Customer Support Consultants are available Monday through Friday between 8:00A.M. and 8:00 P.M. EST to address all your questions. Information Builders consultants caalso give you general guidance regarding product capabilities and documentation. Be prepareto provide your six-digit site code ( xxxx.xx) when you call.

    To learn about the full range of available support services, ask your Information Buildersrepresentative about InfoResponse Online, or call (800) 969-INFO.

    Help Us to Serve You Better

    To help our consultants answer your questions effectively, be prepared to providespecifications and sample files and to answer questions about errors and problems.

    The following table lists the environment information that our consultants require.

    Platform

    Operating System

    OS Version

    JVM Vendor

    JVM Version

    The following table lists the deployment information that our consultants require.

    For example, JCA, Business Services Provider, iWay Service Manager

    Adapter Deployment

    For example, WebSphereContainer

    Version

    12 iWay Softwar

    Help Us to Serve You Better 

    http://techsupport.iwaysoftware.com/http://techsupport.iwaysoftware.com/

  • 8/18/2019 iWay Data Quality Center User's Guide

    13/182

    Enterprise Information

    System (EIS) - if any

    EIS Release Level

    EIS Service Pack 

    EIS Platform

    The following table lists iWay-related information needed by our consultants.

    iWay Adapter

    iWay Release Level

    iWay Patch

    The following table lists the types of iWay Explorer. Specify the version (and platform, if different than listed previously) in the columns provided.

    PlatformVersioniWay Explorer Type

    Swing

    Servlet

    Eclipse™

    Embedded in iWay Designer

    The following table lists additional questions to help us serve you better.

    Error/Problem Details or InformationRequest/Question

    Did the problem arise through

    a service or event?

    Provide usage scenarios orsummarize the applicationthat produces the problem.

    iWay Data Quality Center User's Guide 1

    Preface

  • 8/18/2019 iWay Data Quality Center User's Guide

    14/182

    Error/Problem Details or InformationRequest/Question

    When did the problem start?

    Can you reproduce thisproblem consistently?

    Describe the problem.

    Describe the steps toreproduce the problem.

    Specify the error message(s).

    Any change in theapplication environment: for

    example, softwareconfiguration, EIS/databaseconfiguration, or application?

    Under what circumstancedoes the problem not occur?

    Following is a list of error/problem files that might be applicable.

    Input documents (XML instance, XML schema, non-XML documents)

    Transformation filesError screen shots

    Error output files

    Trace files

    Service Manager package to reproduce problem

    Custom functions and services in use

    Diagnostic Zip

    Transaction log

    For information on tracing, see the iWay Service Manager User's Guide.

    14 iWay Softwar

    Help Us to Serve You Better 

  • 8/18/2019 iWay Data Quality Center User's Guide

    15/182

  • 8/18/2019 iWay Data Quality Center User's Guide

    16/182

    16 iWay Softwar

    iWay Software Training and Professional Services

  • 8/18/2019 iWay Data Quality Center User's Guide

    17/182

    iWay

    Introducing iWay Data Quality

    Center

    1

    Topics:This section provides an overview of iWay Data Quality Center (DQC). It describesthe product features used in themanagement of data quality, and thesupplied modules that enable integrationwith the infrastructure at your site.

    About iWay Data Quality Center

    Managing Data Quality 

    Unifying Records

    This section also summarizesdeployment, operational, andperformance features of the product.

    Supplied Modules

    Summary of Other Product Features

    iWay Data Quality Center User's Guide 1

  • 8/18/2019 iWay Data Quality Center User's Guide

    18/182

  • 8/18/2019 iWay Data Quality Center User's Guide

    19/182

    Parsing and standardization. Parsing is the decomposition of a field into its componenparts. Standardization applies consistent formats to field values, based on industry standards, local standards (for example, postal authority standards for address data),user-defined business rules, and knowledge bases that consist of values and patterns

    Cleansing. Cleansing is the modification of data values to satisfy domain restrictions,integrity constraints, or other business rules that define data quality for your organizationWith cleansing, inaccurate data from a data source is detected and corrected or removedCleansing ensures that a given set of data is complete, accurate, and valid, making thedata meaningful and useful. Cleansing minimizes data errors and improves businessperformance.

    Matching. Matching is identifying, then linking or merging, related entries within or acrossets of data.

    Enrichment. Enrichment is the enhancement of internally stored data by appendingrelated attributes from external sources (for example, consumer demographic attributeor geographic descriptors).

    Monitoring. Monitoring is the deployment of controls to ensure ongoing conformity of data to the business rules that define data quality for your organization.

    Unifying Records

    One of the main technological capabilities of a data quality management tool is unificationof any number of records that contain the same content.

    iWay DQC enables data integration from different sources by analyzing the content, applyincleansing rules, and validating data against specified dictionaries. The processed data cathen be unified using the iWay DQC hierarchical unification methods.

    The process also enables associative pairing, even when different identification key structureexist. Associative pairing includes partially complete records. A single identification key isnot required.

    When data quality is poor or when insufficient information about the identification key affectunification results, iWay DQC explicitly marks records to allow for manual correction.

    Supplied Modules

    iWay DQC architecture is customizable. The product is shipped with ready-to-use modules

    that allow for easy integration with an existing Information Technology (IT) infrastructure.

    Data Quality Modules

    iWay DQC Base. The core module used in data quality and data flow management. Iincludes the ability to define business rules.

    iWay Data Quality Center User's Guide 1

    1. Introducing iWay Data Quality Center 

  • 8/18/2019 iWay Data Quality Center User's Guide

    20/182

    iWay DQC Profile. Module for advanced data profiling. It includes semantic analysisand the application of business rules.

    iWay DQC Reporting. Module for data quality monitoring and reporting.

    Business Task Modules

    iWay DQC Address. Module for parsing, cleansing, and identifying address records iany form, including unstructured text in a field.

    iWay DQC Party. Module for identification and unification of physical persons and legaentities.

    iWay DQC Contact. Module for contact information quality management.

    iWay DQC Household. Module for implementation of client identification, addressesand additional information used to identify households.

    iWay DQC Car. Module for vehicle data identification.

    Technology Modules

    iWay DQC Batch. Data interface for batch processing mode.

    iWay DQC Online. Data interface for on-demand processing mode. It includes Webservice methods and implementation of data quality firewall functionality.

    The technology behind iWay DQC is configurable through management applications ormetadata. From templates supplied with the product, you can derive new configurations fo

    specific information entities. For example, you can modify the iWay DQC Party configuratiotemplate to create new configurations for managing the quality of driver license data.

    Summary of Other Product

    Features

    iWay DQC provides the following deployment, operational, and performance features.

    Deployment. iWay DQC is compatible with other platforms in the industry. Compatibilitis achieved by leveraging proven Java™ technologies. The product technology is easy tintegrate with an existing Information System/Information and Communication

    Technologies (IS/ICT) infrastructure. It integrates with any Enterprise Service Bus (ESBService-Oriented Architecture (SOA), or extract, transform, load (ETL) tool, including iWaService Manager, IBM WebSphere ® , Oracle WebLogic ® , and SAP NetWeaver ® .

    20 iWay Softwar

    Summary of Other Product Features

  • 8/18/2019 iWay Data Quality Center User's Guide

    21/182

    Flexibility and open standards. The iWay DQC solution is easily configured usingsupplied administration applications. Operation does not require any external tools orother third-party applications. iWay DQC is platform independent. It is based on openstandards (XML, Web services, and SOA). iWay DQC implements documented conceptuadata models that are portable across many existing database platforms.

    Core functionality. The core system is composed of a set of algorithms capable of hierarchical unification by identification keys, regardless of internal data structure. By using the defined keys, iWay DQC can perform approximate matching in record unification

    External reference data sources. iWay DQC taps into external data sources, such anational addresses or name registries, to retrieve reference data for parsing, cleansingand validation. iWay DQC also uses names, organizations, academic titles, phonenumbers, and other dictionaries of information to parse and validate input data. You caextend this feature with your own custom lists.

    Performance. iWay DQC uses parallel data processing methods to ensure scalability and enable incremental data processing, both in batch and on-demand online processinmodes. Online mode can perform the data quality process within less than 0.1 secondBatch mode can process more than 5,000,000 records in an hour. You can embed iWaDQC into business-to-business (B2B), application-to-application (A2A), portal, and extracttransform, load (ETL) processes for both online and batch modes.

    iWay Data Quality Center User's Guide 2

    1. Introducing iWay Data Quality Center 

  • 8/18/2019 iWay Data Quality Center User's Guide

    22/182

    22 iWay Softwar

    Summary of Other Product Features

  • 8/18/2019 iWay Data Quality Center User's Guide

    23/182

    iWay

    System Requirements and

    Installation

    2

    Topics:This section describes the systemrequirements of the two majorcomponents of iWay Data Quality Center(DQC). It also describes how to installiWay DQC as part of iWay IntegrationTools (iIT).

    System Requirements

    Installation Procedure

    Installing Database Connectivity 

    Drivers

    License Key 

    iWay Data Quality Center User's Guide 2

  • 8/18/2019 iWay Data Quality Center User's Guide

    24/182

    System Requirements

    iWay DQC consists of two major components: the server engine and the graphical userinterface. Each component has a different set of system requirements.

    Server Engine (Core)

    The code for the server engine is platform-independent. Therefore, you can run the serverengine on almost any platform (combination of operating system and processor architectureas long as there is a suitable Java Runtime Environment (JRE) for that platform.

    The server engine requires JRE 1.4 or later. However, JRE 1.5 or later is recommended. Inparticular, certain advanced features (namely, the Reporting step) are not available if iWayDQC is run on JRE 1.4.

    iWay DQC requires a sufficient amount of memory (at least 256 MB). Large configurationsmay require up to 1 GB. Additional memory may improve performance of the engine.

    iWay DQC also requires enough disk space for temporary files and data. Two to three timethe amount of memory for the input data is recommended.

    Graphical User Interface

    The iWay DQC Graphical User Interface (GUI) is available for Microsoft Windows ® . The GUis bundled with JRE 1.5. No additional pre-installed packages are required.

    For optimum performance, a 2 GHz Intel ®  Pentium-class processor (or equivalent) with 1 Gof memory, and a screen resolution of at least 1024x768, is recommended.

    The installed product requires approximately 400 MB of disk space.

    The following table summarizes the requirements.

    iWay DQC GUIiWay DQC CoreComponent

    Intel-compatible. 2 GHz isrecommended.

    Any.Processor

    Microsoft Windows, 32-bit versiononly.

    Any.Operating system

    None.JRE 1.4 or later. JRE 1.5is recommended.

    Software

    At least 512 MB. 1 GB isrecommended.

    At least 256 MB. 1 GB ormore is recommended.

    Memory 

    400 MB.80 MB.Disk space forinstallation

    24 iWay Softwar

    System Requirements

  • 8/18/2019 iWay Data Quality Center User's Guide

    25/182

    iWay DQC GUIiWay DQC CoreComponent

    At least 1024x768.Not applicable.Screen resolution

    Choosing the Correct JRE for the Server Engine

    For most platforms, multiple JREs from different vendors are available. Not all JREs arestable enough to allow processing of large amounts of data. As a best practice, it isrecommended that you use the Sun JRE on Windows and Linux/UNIX ®  systems runningIntel-compatible processors. Most vendors of commercial UNIX distributions provide JREsthat are stable for their platforms.

    If available, a commercial JRE with support and regular updates is recommended forproduction deployments.

    Installation ProcedureiWay Data Quality Center (DQC) is currently packaged with iWay Integration Tools (iIT). Youmust have a valid license key to use iWay DQC with iIT.

    iWay DQC is distributed in two bundles:

    Platform-independent iWay DQC server engine (core).dqc-core-version.zip

    Graphical user interface with bundled JRE. A copy of iWay DQC core is located in the run-time subdirectory within the archive.

    dqc-version-win32.zip

    Installation of the product consists of extracting the files to the chosen location (for examplec:\Program Files\DQC on Windows, /opt/DQC on Linux/UNIX), and copying the license filto the user home folder (this folder is usually c:\Documents and Settings\ user_name onWindows and ~ on Linux/UNIX).

    When you install the GUI, it is recommended that you place a shortcut to dqc.exe in a Starmenu folder or on the desktop for easy access.

    See License Key  on page 26 for more information on the license file.

    iWay Data Quality Center User's Guide 2

    2. System Requirements and Installation

  • 8/18/2019 iWay Data Quality Center User's Guide

    26/182

    Installing Database Connectivity

    Drivers

    iWay DQC uses the Java Database Connectivity (JDBC) API for connecting to databases.

    JDBC drivers are available for most database engines and are distributed as componentsof the database engine, or separately as connectivity components. The licensing terms donot always allow distribution of these drivers with iWay DQC. Therefore, iWay DQC ships wita basic set of drivers for the most common databases. You may install additional drivers.

    The following drivers, which are shipped with iWay DQC, are located in the lib/jdbc subfoldeof the iWay DQC core installation.

    DescriptionDriver

    A JDBC driver for Oracle databases. The distribution contains the

    9i and 10g versions of the driver.

    Oracle

    An open-source driver for connecting to both Microsoft SQL Serverand Sybase server.

     jTDS

    You must install each driver (including those shipped with the product) before you can useit. You can install a driver to the core by copying its .jar file to the lib subfolder of the coreinstallation, and using the dialog Window  > Preferences > iWay DQC  > DB Drivers in the GU

    License Key

    By purchasing iWay DQC, you obtain the license key (a file with a .plf extension). When iWaDQC core starts, it looks for this file first in the installation folder, then in the home folderof the current user, and finally in the folder defined by the PURITY_HOME system variable

    Each license file may contain several restrictions, such as the operating system, iWay DQCversion, or date validity range. A license file is valid only if all its conditions for use are metAdditionally, a license file may contain a restriction on product functionality. Functionality not covered by the license file is reported as an error by both the GUI and core.

    If no matching license key is found, iWay DQC exits with an error.

    26 iWay Softwar

    Installing Database Connectivity Drivers

  • 8/18/2019 iWay Data Quality Center User's Guide

    27/182

    iWay

    Getting Started3

    Topics:iWay DQC Manager is a design tool forsolving data quality problems. An intuitivedrag-and-drop graphical interface allowsyou to easily build complex dataprocessing logic and quickly diagnoseproblems. The many included dataprocessing engines allow you to addressa wide variety of problems.

    Creating a New Project

    Plan File Basics

    Using Input Files

    Running and Debugging a Plan

    iWay DQC Manager uses industry-standard formats, such as MicrosoftExcel and JDBC. It is built on top of theEclipse Integrated DevelopmentEnvironment (IDE) for proven stability andease of use.

    Connecting to a Database

    You can also run iWay DQC in commandline mode.

    iWay Data Quality Center User's Guide 2

  • 8/18/2019 iWay Data Quality Center User's Guide

    28/182

    Creating a New Project

    To create a new project, select New > Empty Project, Simple Project, or DQ Project by rightclicking the DQ Projects node in the DQC Explorer (or use the File menu or toolbar).

    An Empty Project is a project that contains no files or folders by default.

    A Simple Project is a project that contains a default Plan file.

    A DQ Project is a project with a pre-defined folder structure and Plan file based onavailable templates.

    A Simple Project is automatically created when you first run iWay DQC Manager.

    Plan File Basics

    The core of any iWay DQC project is a Plan file. A Plan defines the logic and rules to be

    applied to the input data in order to produce the desired output. Plans are created by placinsteps on a canvas and connecting them. Steps can be used to read, write, transform, andanalyze data, among other actions.

    To create a new Plan file, select New > Plan by right-clicking a project or folder in the DQCExplorer (or use the File menu or toolbar). To start building a Plan, drag a step from thepalette and drop it onto the canvas. Connect steps by dragging from the "out" endpoint ofone step to the "in" endpoint of another.

    You can edit properties for each step by double-clicking the step, or by right-clicking the steand clicking Edit Properties. To easily align and arrange the steps in a Plan, use the auto-layout and alignment buttons above the canvas (or select those options by right-clicking on

    or more steps).You can embed Plans in other Plans in order to reuse a series of steps that have already been created. This is done by dragging the New Include object from the palette onto thecanvas and selecting the Plan file to include. To connect the Included Plan to other stepsin the Plan, right-click the Included Plan and click Add Step reference. Select the appropriatinput or output steps from the displayed list of steps in the embedded Plan.

    To use the embedded Plan, connect the steps inside the Included Plan to the steps in thecontaining Plan. Double-clicking the include box opens the Included Plan for editing. To returto the containing Plan, use the tabs at the bottom of the canvas.

    Using Input FilesYou can add existing files to iWay DQC Manager for use as input data for a Plan. For exampleyou can add files by dragging and dropping them from the file system to the desired projecin the DQC Explorer, or by copying them from the destination folder to the desired projectfolder inside the workspace folder in the file system.

    28 iWay Softwar

    Creating a New Project

  • 8/18/2019 iWay Data Quality Center User's Guide

    29/182

    To use an input file in a Plan, you must first assign it metadata describing the format of thedata. When a data file (for example, .txt or .csv file) is opened for the first time, the MetadatEditor is launched. It presents options on how to read the file, such as the type of delimiteused, the data types of each column, and whether the file contains header rows.

    You can preview the resulting data in the lower panel of the editor to assess the results othe metadata settings. Clicking OK  in the Metadata Editor opens the data file for viewing.You can edit the file metadata later by right-clicking the file and clicking Edit Metadata.

    To use input files inside a Plan, add one of the input steps to the canvas (for example, TexFile Reader or Excel File Reader), and type the input file name in the File Name property. Fomore information on the available steps in iWay DQC Manager, refer to the documentationfor each step. Alternatively, you can drag text files from the DQC Explorer directly onto thecanvas, where a Text File Reader is generated after the metadata is created.

    Running and Debugging a Plan

    To run a Plan, click the Run button on the toolbar, or right-click the canvas and click Run.

    Errors in the Plan are shown in the Properties panel as the Plan is constructed. Clicking anindividual step shows only the warnings and errors for that step. Double-clicking an error inthe Properties panel opens the step properties dialog to the field that contains the error.

    You can also debug individual steps by clicking the Debug  button on the toolbar when a steis selected, or by right-clicking a step and clicking Debug .

    Connecting to a Database

    The following JDBC database drivers are included with iWay DQC Manager. You can add

    other drivers in the DB Drivers preferences.

    Oracle

    Sybase

    Microsoft SQL Server

    To connect to one of these database types, right-click the Databases node in the DQCExplorer, and click New > Database Connection. Clicking a driver name from the drop-downlist populates the URL string field with a template for connecting to the specified databastype.

    After the database connection has been made, the database is shown in the Databasesnode in the DQC Explorer. Clicking the table names shows metadata for each table in theProperties panel.

    iWay Data Quality Center User's Guide 2

    3. Getting Started

  • 8/18/2019 iWay Data Quality Center User's Guide

    30/182

    To view the results of an SQL query on a table, right-click a table and click Open in SQLeditor . A default query is shown, listing all table entries (grouped in batches if the numberof rows is large). To change the query, edit the query text and click the Execute button. Toretrieve more results from the query, click Next batch or Read rest (to show all results).

    30 iWay Softwar

    Connecting to a Database

  • 8/18/2019 iWay Data Quality Center User's Guide

    31/182

    iWay

    Configuring Services4

    Topics:iWay supplies two predefined servicesthat you can use as part of your iWay Data Quality Center (DQC) projects. XDDQAgent

    This topic describes how to configure thesupplied services so that you canincorporate them in process flows.

    XDDQCBatchExecAgent

    iWay Data Quality Center User's Guide 3

  • 8/18/2019 iWay Data Quality Center User's Guide

    32/182

    XDDQAgent

    The supplied iWay DQC service named com.ibi.agents.XDDQAgent is configured to passinformation to the named Data Quality Provider and to retrieve the responses generated bthe iWay DQC Plan. Using iWay Integration Tools, you must supply parameters (property 

    values) that define this service.

    For details on the use of this service, see the iWay Data Quality Center Getting Startedmanual.

    XDDQCBatchExecAgent

    In this section:

    Supplying Parameters

    Generating a Run-Time Configuration File

    How Does the XDDQCBatchExecAgent Work?

    Sample Files

    Referring to a File Name

    The supplied iWay DQC service named com.ibi.agents.XDDQCBatchExecAgent invokes theiWay DQC run-time (batch) execution environment, through the runcif.bat file. This serviceenables dynamic allocation of external files and data sources. By running the runcif.bat filethe service executes a Plan with a dynamic run-time configuration file.

    For details on the runcif.bat file, see Running iWay DQC in Command Line Mode on pag101.

    For details on the run-time configuration file, see Configuring Run-Time Variables on pag105.

    Supplying Parameters

    You must supply parameters that define the XDDQCBatchExecAgent. An inbound documencauses the iWay DQC run-time environment to execute, based on the supplied parameters

    32 iWay Softwar

     XDDQAgent

  • 8/18/2019 iWay Data Quality Center User's Guide

    33/182

    The following table describes the XDDQCBatchExecAgent parameters.

    DescriptionParameter Name

    Location of the runcif.bat file. By default, the runcif.batfile is located in the DQC_BASE/runtime/bin directory.For example:

    C:\dqc\runtime\bin

    DQC Runtime Command File(required)

    Fully qualified location of the Plan file that the runcif.batfile will execute. For example:

    C:\dqc\workspace\samples\

    01_Hello_World\bin\batch_Hello_World.plan

    Plan File Location (required)

    Fully qualified location of the default run-time configuration

    file. This file contains all the static default allocations.

    Runtime Configuration File

    Location (required)

    Comma-separated list of names of additional pathvariables, or a single name of an additional path variable.Use this parameter to add one or more path variables tothe dynamic default run-time configuration file.

    Use this parameter with the Additional Path VariableValue(s) parameter. For each additional name, there mustbe a corresponding value.

    If you supply this parameter, the path variables will be

    added to the default configuration file. The file will thenbe used to execute the iWay DQC run-time environment.

    For example:

    MyPath

    For a detailed example of a run-time configuration file withadditional path variable names, see Sample Run-TimeConfiguration File With Additional Path Variable Names onpage 35.

    You may leave this parameter blank.

    Additional Path VariableName(s)

    iWay Data Quality Center User's Guide 3

    4. Configuring Services

  • 8/18/2019 iWay Data Quality Center User's Guide

    34/182

    DescriptionParameter Name

    Comma-separated list of additional path variable values.Use this parameter to add path variable values (allocation

    values) to the preceding list of names.For example:

    C:/temp

    Additional Path VariableValue(s)

    Time, in seconds, for an iWay DQC timeout. The defaultvalue, 0, means no timeout.

    Timeout

    The following guidelines apply.

    You may supply values that are discrete strings or Special Register (SREG) references

    in the format SREG(variableName).

    You must specify the iWay DQC base installation location. For example, if iWay DQC isinstalled in C:\DQC, the required parameter is C:\iway60\etc\dqc\bin.

    Generating a Run-Time

    Configuration File

    Example:

    Sample Default Run-Time Configuration File

    Sample Run-Time Configuration File With Additional Path Variable NamesOther Examples

    In the iWay DQC Graphical User Interface (GUI), you can generate a run-time configurationfile. Right-click your project, click New , and click iWay Runtime Configuration.

    In design time, you can create a path variable. Right-click your project, click New , and clicPath Variable.

    34 iWay Softwar

     XDDQCBatchExecAgent

  • 8/18/2019 iWay Data Quality Center User's Guide

    35/182

    Sample Default Run-Time Configuration FileExample:

     

     

       

     

     

     

     

     

    Sample Run-Time Configuration File With Additional Path Variable NamesExample:

    In the Additional Path Variable Name(s) field, specify the following:

    PathOne,PathTwo,PathThree

    in the Additional Path Variable Value(s) field, specify:

    C:/pathOne,c:/pathTwo,c:/pathThree

    The resulting run-time configuration file used by the service is shown here. It is based onthe default run-time configuration file.

     

       

     

     

     

     

     

     

     

     

     

    iWay Data Quality Center User's Guide 3

    4. Configuring Services

  • 8/18/2019 iWay Data Quality Center User's Guide

    36/182

    Other ExamplesExample:

    The following table lists other examples of path variable names and their values.

    Additional Path Variable ValueAdditional Path Variable Name

    APathSREG(DQC.pathnames)

    C:\apathSREG(DQC.PathValues)

    How Does the XDDQCBatchExecAgent

    Work?

    The XDDQCBatchExecAgent accepts an XML document and executes the configured Plan.

    The resulting XML document is the original document with the addition of the attribute

    DQCResult="0" on the root element.

    The following table describes the possible return codes.

    DescriptionReturn

    Code

    iWay DQC execution completed successfully.0

    iWay DQC execution completed with warnings.16

    iWay DQC execution completed with errors.17

    Abnormal iWay DQC execution termination.18

    No valid license file was found.19

    Plug-in version check failed. This usually means that the iWay DQCinstallation is corrupted. Reinstallation is recommended.

    20

    Incorrect arguments were given to the runcif script.21

    Assume that you have the following XML input file:

     

     

     

    36 iWay Softwar

     XDDQCBatchExecAgent

  • 8/18/2019 iWay Data Quality Center User's Guide

    37/182

    After successful execution of the XDDQCBatchExecAgent, the resulting XML file is:

     

     

     

    With the XDDQCBatchExecAgent, the structure of the original XML file is preserved.

    Sample Files

    runcif.bat File

    @echo off

    rem Start script for DQC - batch mode

    rem $Id: runcif.bat 11177 2009-02-06 15:50:18Z pavel.nejedly $

    set PURITY_HOME=D:\DQC-5.3.1\runtime

    rem preparing classpath

    set CLASSPATH=

    for %%I in (%PURITY_HOME%\lib\*.jar) do @call %PURITY_HOME%\bin\appendcp.bat %%I

    rem echo Using CLASSPATH=%CLASSPATH%

    :okJava

    "D:\DQC-5.3.1\jre\bin\java" cz.adastra.cif.processor.bin.CifProcessor %*

    :end

    Run-Time Configuration File

     

     

     

     

     

     

     

    iWay Data Quality Center User's Guide 3

    4. Configuring Services

  • 8/18/2019 iWay Data Quality Center User's Guide

    38/182

    Referring to a File Name

    In the iWay DQC Plan, the Text File Reader refers to the location using:

    purity://MyVariable/filename

    In the iWay DQC Graphical User Interface (GUI), use the path variable as follows. The firstimage shows the file name in the File Name field for the Text File Reader.

    The next image shows the DQC Explorer tree.

    To directly refer to a file name, instead of using folder navigation, use the following syntax

    purity://MyFileVariable/

    38 iWay Softwar

     XDDQCBatchExecAgent

  • 8/18/2019 iWay Data Quality Center User's Guide

    39/182

    iWay

    Working With Data Types5

    Topics:This section provides information on thesupported data types in iWay DQCrecords, input/output (I/O) operations,and step properties.

    Supported Data Types

    Formatting Data Types

    Parsing Errors

    Data Types in Step Properties

    JDBC Data Type Conversions

    iWay Data Quality Center User's Guide 3

  • 8/18/2019 iWay Data Quality Center User's Guide

    40/182

    Supported Data Types

    iWay DQC supports the following data types in records:

    Integer. Whole number ranging from -231 to 231-1.

    Long. Arbitrary-precision signed decimal number.

    Float. Arbitrary-precision signed decimal number. You can control the output precisionand the precision of the division operation by the double.scale run-time parameter, whichas a value of 10 by default.

    String. Sequence of characters that is treated as text.

    Day. Calendar date without time fields. For more information, see Parsing Errors on pag40.

    Datetime. Calendar date with time fields. For more information, see Parsing Errors on

    page 40.

    Boolean. Logical value that can be true or false.

    Formatting Data Types

    Formatting rules for parsing input and output data into iWay DQC data types are defined bthe data format parameters of the respective input/output processing steps. See thedocumentation on steps for details.

    Parsing Errors

    In all cases, if null exists in the input field, then null is written to the related output fieldwithout generating an error.

    The following errors may occur for each data type:

    STRING. Does not generate any errors.

    BOOLEAN. When there is a non-null value in the input that cannot be parsed, anUNPARSABLE_FIELD error is generated.

    INTEGER. When there is a non-null value in the input that cannot be parsed, anUNPARSABLE_FIELD error is generated.

    FLOAT. When there is a non-null value in the input that cannot be parsed, anUNPARSABLE_FIELD error is generated.

    LONG. When there is a non-null value in the input that cannot be parsed, anUNPARSABLE_FIELD error is generated.

    40 iWay Softwar

    Supported Data Types

  • 8/18/2019 iWay Data Quality Center User's Guide

    41/182

    DAY. If the data parsing ends with an error, an INVALID_DATE error is generated. If theREAD_POSSIBLE option is set, the step parses the data again, this time with addedleniency towards nonsensical numeric parts of the date. For example, the string32-13-2000 represents a valid date value that is parsed as 1.2.2001. If even lenientparsing fails, an UNPARSABLE_FIELD error is generated.

    DATETIME. Processing is the same as for the DAY data type.

    Each step that handles I/O parsing of iWay DQC data types must implement a specificstrategy that manages error handling.

    Data Types in Step

    Properties

    You can use the following data types in the definition of step properties:

    stringinteger

    long

    date

    float

    boolean

    double

    JDBC Data Type Conversions

    When data is read from a database type to an internal data type, or when data is writtenfrom an internal data type to a database type, a set of predefined conversions is used. Thfollowing table shows how data is converted between a database type and an internal dattype.

    JDBC set MethodJDBC get MethodSQL Data TypeInternal Data Type

    setBooleangetBooleanBITboolean

    setIntgetIntINTEGERinteger

    setBigDecimalgetBigDecimalBIGINTlong

    setTimestampgetTimestampTIMESTAMPdate

    iWay Data Quality Center User's Guide 4

    5. Working With Data Types

  • 8/18/2019 iWay Data Quality Center User's Guide

    42/182

    JDBC set MethodJDBC get MethodSQL Data TypeInternal Data Type

    setDategetDateDATEday 

    setBigDecimalgetBigDecimalDECIMALfloat

    setStringgetStringVARCHARstring

    To read data from a database or write data to a database, the JDBC get or set method isused. For example, to read/write a date internal data type from/to a database, the JDBCfunctions getTimestamp()/setTimestamp() are used. These conversions are used by allJDBC-related steps (such as Jdbc Reader, Jdbc Writer, SQL Execute, and SQL Select).

    JDBC Internal Conversions

    The JDBC specifications define the JDBC capability for inner type conversions (the differenc

    between which JDBC method you use to read/write data and the real database column dattype). These specifications are available here. The conversion abilities of certain driversdepend on the JDBC specification version they implement. Base conversions are defined iAPI 1.0 and extended in 3.0.

    Most of the drivers support JDBC 3.0. However, some drivers may not implement theseconversions fully, or a database may use its own extra data types. Real conversion abilitieare JDBC driver dependent. The previously mentioned JDBC methods used to read/writedata from/to a database were chosen taking into consideration maximum compatibility witmajor databases and their JDBC connectors.

    42 iWay Softwar

     JDBC Data Type Conversions

  • 8/18/2019 iWay Data Quality Center User's Guide

    43/182

    iWay

    Creating Dictionary Files6

    Topics:It is often necessary to use referencedata with certain steps (for example, tolook up values for matching purposes).The reference data must be placed indictionary files, which are created andmaintained in iWay Data Quality Center(DQC).

    Dictionary File Types

    Dictionary File Type Summary 

    Information for Specific Steps

    The process for creating dictionary filesinvolves:

    Reading the reference data from asupported input type (text file, DBF file, or JDBC).

    Preparing the data (for example,creating a matching value with theCreate Matching Value step).

    Generating the dictionary file using

    the appropriate generator.

    iWay Data Quality Center User's Guide 4

  • 8/18/2019 iWay Data Quality Center User's Guide

    44/182

    Dictionary File Types

    In this section:

    StringLookup

    IndexedTableLookup

    MatchingLookup

    SelectiveMatchingLookup

    iWay DQC uses four types of dictionary files:

    StringLookup, which is an indexed list of strings.

    IndexedTableLookup, which is an indexed table.

    MatchingLookup, which is a lookup file indexed by a matching value that contains realvalues.

    SelectiveMatchingLookup, which is an extension of the MatchingLookup file type, usedfor selective lookup matching.

    StringLookup

    This dictionary file is an indexed list of strings, used for getting information about the presencof a string in a dictionary file. This file consists of a single column of strings. Data typesother than string are not valid. Other data types must first be converted to string if they arto be used.

    Used by: String Lookup step, Validate Email step, Validate Phone Number step, GuessName Surname step, Experimental Exclude Spaces step

    Generator: String Lookup Builder step

    IndexedTableLookup

    This dictionary file is an indexed table with defined index values, used for looking up recordby their corresponding keys. The full record data is contained in the file, as it was definedduring the generation of the file.

    Used by: Apply Replacement step, Convert Phone Numbers step, Strip Titles step, TransformLegal Forms step, Validate In Res step, Validate SKRZ step, Validate Vat Id step, ValidateVin step, Table Matching step, Value Replacer step

    Generator: Indexed Table Builder step

    44 iWay Softwar

    Dictionary File Types

  • 8/18/2019 iWay Data Quality Center User's Guide

    45/182

    MatchingLookup

    This dictionary file is used for looking up a matching value from a real value. The file isindexed by the matching value.

    Used by: Guess Name Surname step, Intelligent Swap Name Surname step, Swap NameSurname step, Validate Vin step

    Generator: Matching Lookup Builder step

    SelectiveMatchingLookup

    This dictionary file is an extension and modification of the MatchingLookup file. Otherparameters (in addition to the real and matching values) can be used in the lookup. Theother parameters provide a lookup of the best variant from the set of variants that fit thepair of matching and real values.

    Used by: Selective Res Lookup step

    Generator: Selective Matching Lookup step

    Dictionary File Type Summary

    The following table contains a list of the steps that require dictionary files and details ontheir use.

    DescriptionDictionary File TypeFilename PropertyStep

    File contains numbers

    only. Indexed by names.For further information,see below.

    IndexedTableLookupfirstNameRatioLookupFileNameUpdate

    Gender

    File contains numbersonly. Indexed by surnames.

    IndexedTableLookupsurnameRatioLookupFileName

    File contains all top-leveldomains in uppercasewithout dots.

    StringLookuptldLookupFileNameValidateEmail

    File contains all knownIDCs.

    StringLookupidcLookupFileNameValidatePhoneNumber

    File contains prefixes of known Telcos.

    StringLookupprovLookupFileName

    iWay Data Quality Center User's Guide 4

    6. Creating Dictionary Files

  • 8/18/2019 iWay Data Quality Center User's Guide

    46/182

    DescriptionDictionary File TypeFilename PropertyStep

    File contains originalvalues with their

    replacements. Indexed by the original values.

    IndexedTableLookuplegalFormsLookupFileNameTransformLegal Forms

    File contains referencedata of companies.Indexed by company registration number.

    IndexedTableLookupdatabaseFileValidate InRes

    File contains originalprefixes with patterns toform a number in the new

    format. For furtherinformation, see below.

    IndexedTableLookupconversionTableFileNameConvertPhoneNumbers

    File contains knownnames.

    MatchingLookupfirstNameLookupFileNameGuess NameSurname

    File contains knownsurnames.

    MatchingLookuplastNameLookupFileName

    File contains known multi-word names.

    MatchingLookupmultiFirstNameLookupFileName

    File contains known multi-word surnames.

    MatchingLookupmultiLastNameLookupFileName

    File contains knownnames.

    MatchingLookupfirstNameLookupFileNameIntelligentSwap NameSurname

    File contains knownsurnames.

    MatchingLookuplastNameLookupFileName

    File contains matchingvalues with theirreplacements for knowntitles. Indexed by matching value.

    IndexedTableLookuptitleLookupFileNameStrip Titles

    46 iWay Softwar

    Dictionary File Type Summary 

  • 8/18/2019 iWay Data Quality Center User's Guide

    47/182

    DescriptionDictionary File TypeFilename PropertyStep

    File contains knownnames. This step is

    deprecated. Use theIntelligent Swap NameSurname step instead.

    MatchingLookupfirstNameLookupFileNameSwap NameSurname

    File contains knownsurnames. This step isdeprecated. Use theIntelligent Swap NameSurname step instead.

    MatchingLookuplastNameLookupFileName

    File contains numbers and

    names of known taxoffices. Indexed by numbers.

    IndexedTableLookupfoLookupFileNameValidate Vat

    Id

    File contains knowncompany registrationnumbers and company names. Indexed by numbers.

    IndexedTableLookupcnLookupFileName

    File contains known WMIcodes as keys and

    patterns to match VINs ina second dictionary file.For further information,see below.

    IndexedTableLookupwmiFileNameValidate Vin

    File contains the followingcolumns: patterns formatching input VIN,manufacturer, car model,year that VIN was issued,position of CRC number,

    and position of yearnumber. Indexed by matching pattern.

    IndexedTableLookupvinInfoFileName

    iWay Data Quality Center User's Guide 4

    6. Creating Dictionary Files

  • 8/18/2019 iWay Data Quality Center User's Guide

    48/182

    DescriptionDictionary File TypeFilename PropertyStep

    File contains Slovakdistrict codes and names.

    Indexed by district codes.

    IndexedTableLookupdistrictLookupFileNameValidateSKRZ

    File contains originalvalues with theirreplacements. Indexed by original values.

    IndexedTableLookupreplacementsFileNameApply Replacements

    File contains a list of strings from which to lookup.

    StringLookuplookupFileNameString Lookup

    File contains referencedata of companies. Thisincludes real andmatching values of company names, company registration numbers, andan additional optionalfield.

    SelectiveMatchingLookupfileNameSelective ResLookup

    File contains table fromwhich to look up data.Indexed by keys used for

    looking up data.

    IndexedTableLookupindexTableFileNameTableMatching

    File contains list of knownwords.

    StringLookupdatabaseFileExperimentalExcludeSpaces

    File contains replacementnames (first names andsurnames) written only inuppercase. Indexed by original values in

    uppercase.

    IndexedTableLookupnameLookupFileNameAnonymizer

    48 iWay Softwar

    Dictionary File Type Summary 

  • 8/18/2019 iWay Data Quality Center User's Guide

    49/182

    Information for Specific

    Steps

    In this section:

    ValidateVINAlgorithm Dictionary Files

    Convert Phone Numbers Step Dictionary Files

    Update Gender Step Dictionary Files

    This topic provides details on steps that require additional explanation or have more compleconfiguration requirements.

    ValidateVINAlgorithm Dictionary

    Files

    Background information about WMI (World Manufacturer Identifier) and VIN (VehicleIdentification Number) codes is not provided here. For information about those codes, refeto the VIN article on Wikipedia at http://www.wikipedia.org .

    The Validate VIN step needs two dictionary files in order to execute successfully.

    WMI Dictionary File

    The first dictionary file, referred to by the wmiFileName property, is of the MatchingLookupfile type. It must contain a WMI code as a matching value and a key name for lookup in theVIN dictionary file. The key name is a string that consists of a WMI code and a mask(optional), followed by the underscore character (_) and a unified manufacturer name (inuppercase and without accents).

    The mask starts at the fourth position of the VIN (the first three characters are for the WMcode) and can consist of up to 11 characters. If no mask is defined, a default mask of *********** (11 asterisks) is used. An asterisk is a wild card that represents any character, as opposed to a specific character.

    If a character other than an asterisk is placed in any of the mask fields, the specifiedcharacter will be used at that position. For example, the mask ***6Y defines characters6Y at the 7th and 8th positions. The whole key name will then look like, for example,TMB***6Y_SKODA (SKODA is the manufacturer name). It will match VINTMB1236Y234567890 but not TMB12345234567890.

    VIN Dictionary File

    iWay Data Quality Center User's Guide 4

    6. Creating Dictionary Files

  • 8/18/2019 iWay Data Quality Center User's Guide

    50/182

    The second dictionary file, referred to by the vinInfoFileName property, is of the IndexedTable file type. It is indexed by the key names (the same values that are in the WMI dictionarfile). It contains, in order, these columns: key name, real name of manufacturer, car modeyear that VIN was issued (in four-digit format), position of CRC number (if the VIN codecontains any), and position of year number (if any).

    Convert Phone Numbers Step

    Dictionary Files

    The only dictionary file for this step, referred to by conversionTableFileName, is of the IndexeTable file type. The table is indexed by the source prefix, which consists of the old prefix anthe beginning of the original number that is going to be replaced by the step. The tablecontains the source prefix (the value that was indexed from), the length of the number thawill not be replaced, and the new prefix.

    Example: You need to convert all numbers with the old prefix 02 that start at number 2 (0

    22 93 44 23, 02 23 48 79 67) to a 9-digit national format. The table must have a lineindexed with 022 (02 as the original prefix, 2 as the start number) and must contain 022(source prefix), 7 (number length), and 22 (new prefix). The step then replaces 022 fromthe beginning of a number with 22 from the new prefix and copies 7 numbers from theoriginal phone number.

    Update Gender Step Dictionary

    Files

    Numbers written in the dictionary files are the ratios of males to females with thecorresponding name (names are the indexed value). They are INTEGER values calculated a

    (male_count*1000)/(male_count+female_count). This corresponds to 0 and small numberfor most female names, and 1000 and large numbers primarily for male names.

    50 iWay Softwar

    Information for Specific Steps

  • 8/18/2019 iWay Data Quality Center User's Guide

    51/182

    iWay

    Using Expressions7

    Topics:This section describes expressions usedin iWay Data Quality Center (DQC) steps.Places where the expressions may beused are described in the descriptionsections of the appropriate steps.

    Operands

    Handling Null Values

    Variables

    Operations and Functions

    Regular Expressions

    iWay Data Quality Center User's Guide 5

  • 8/18/2019 iWay Data Quality Center User's Guide

    52/182

    Operands

    Expression operands may be of a defined column type, such as INTEGER, FLOAT, LONG,STRING, DATETIME, DAY, and BOOLEAN. If a number assigned to either an INTEGER or LONGvariable overflows or underflows the interval of permitted values for that type (that is, -

     2147483648;+2147483647 for INTEGER, and - 9223372036854775808;+9223372036854775807 for LONG), then the number wraps around the interval. Forexample, the value 2147483649 assigned to an INTEGER variable is interpreted as -2147483647.

    Operands are automatically converted to a wider type if needed. This feature is relevant fonumeric data types INTEGER, LONG, and FLOAT (widening INTEGER -> LONG -> FLOAT) andatetime types DAY and DATETIME (DAY -> DATETIME). In case of comparisons, and set anconditional operations, all operands are converted to the most general type before theoperation is performed.

    An operand is any expression with a type corresponding to a valid type of a given operationOperands can be divided into four categories:

    Literals. Numeric constants, string constants, or logical constants (TRUE, FALSE,UNKNOWN - deprecated; all the keywords are case-insensitive). Can also be NULL litera(case-insensitive).

    Columns. Columns are defined by their names and represent their values. If there is aspace character in the column name, the name must be enclosed in square brackets [If the step retrieves data from multiple inputs, the column names are specified using donotation, that is, input_name.column_name. If the step uses just one input, you can omthe dot notation.

    Set. Can be used only in combination with the IN operation, in which the set representa constant expression. A set can occur only on the right side of the IN operation.

    Complex expressions.

    Handling Null Values

    Operations and functions handle arguments with a NULL value conforming to SQL rules.There is one exception to the STRING data type. NULL string and empty  string are considereequal. As a result, null string arguments are handled as empty (zero length) strings.

    Example:

    The following are legal comparisons that give a non-null Boolean result:

    "abc" == NULL

    "abc" > NULL

    52 iWay Softwar

    Operands

  • 8/18/2019 iWay Data Quality Center User's Guide

    53/182

    Respectively, they are analogous to the following comparisons:

    "abc" == ""

    "abc" > ""

    However, in SQL, both of these expressions result in a NULL (UNKNOWN) value.

    Variables

    The expression can be formed as a sequence of assignment expressions followed by oneresulting expression. Multiple expressions are delimited by a semicolon (;). An assignmenexpression has the following syntax:

    variable := expression

    The first occurrence of a variable on the left-hand side defines this variable and its type. Areference to a variable in an expression is valid only after its definition. Each followingoccurrence of a variable, including an occurrence on the left-hand side of the assignment

    expression, must conform to the variable type.

    Example:

    a := 2;

    b := 4 - a;

    3 * b

    iWay Data Quality Center User's Guide 5

    7. Using Expressions

  • 8/18/2019 iWay Data Quality Center User's Guide

    54/182

    Operations and Functions

    In this section:

    Arithmetic Operations

    Logical Operations

    Comparison (Relational) Operators

    Set Operations

    Other Operations

    Date Functions

    String Functions

    Bitwise Functions

    MinMax Functions

    Aggregate Functions

    Conditional Expressions

    Conversion and Formatting Functions

    Word Set Operation Functions

    Unclassified Functions

    iWay DQC provides the following operation and function categories:

    Arithmetic operations

    Logical operations

    Comparison operations

    Set operations

    Other operations

    Date functions

    String functionsBitwise functions

    MinMax functions

    Aggregate functions

    54 iWay Softwar

    Operations and Functions

  • 8/18/2019 iWay Data Quality Center User's Guide

    55/182

    Conditional expressions

    Conversion and formatting functions

    Word set operation functions

    Caution: All operations and functions that do not have the locale parameter set or defineuse the default iWay DQC locale. The step locale setting does not influence this behavior.

    Arithmetic Operations

    This category includes common arithmetic operations: addition, subtraction, multiplicationand division. The result of an arithmetic operation applied to the type INTEGER or LONG isalways INTEGER or LONG. The result is type LONG if at least one operand is type LONG.

    Note: Type NUMBER stands for data types INTEGER, LONG, or FLOAT in the description oinput (operand) and output (result) types.

    TypeDescriptionUsageName

    Operand Type:

    NUMBERNUMBER

    Result Type:

    NUMBER

    Subtraction of numeric operands a and b.a - b-

    Operand Type:

    NUMBER

    Result Type:

    NUMBER

    Negation of numeric operand a. For example:

    -(a*c)

    Note: The unary expression operator cannotimmediately follow another arithmetical operatorunless parenthesized. The following expression isinvalid:

    a*-b

    Instead use either

    -b*a

    or:

    a*(-b)

    -a-

    iWay Data Quality Center User's Guide 5

    7. Using Expressions

  • 8/18/2019 iWay Data Quality Center User's Guide

    56/182

    TypeDescriptionUsageName

    Operand Type:

    NUMBERNUMBER

    Result Type:

    FLOAT

    Division of numeric operands a and b.a / b /

    Operand Type:

    NUMBERNUMBER

    Result Type:

    NUMBER

    Multiplication of numeric operands a and b.a * b*

    Operand Type:

    INTEGERINTEGER

    Result Type:

    INTEGER

    Modulo, the remainder after numerical division of a by b.

    a % b%

    Operand Type:

    LONGLONG

    Result Type:

    LONG

    56 iWay Softwar

    Operations and Functions

  • 8/18/2019 iWay Data Quality Center User's Guide

    57/182

    TypeDescriptionUsageName

    Operand Type:

    NUMBERNUMBER

    Result Type:

    NUMBER

    Addition of numeric operands a and b, or stringconcatenation.

    a + b+

    Operand Type:

    STRINGSTRING

    Result Type:

    STRING

    Operand Type:

    INTEGERINTEGER

    Result Type:

    INTEGER

    Division of integer operands without a remainder.a div bdiv

    Operand Type:

    LONGLONG

    Result Type:

    LONG

    Logical Operations

    Common logical operations are AND, NOT, OR, and XOR (all keywords are case-insensitive

    iWay Data Quality Center User's Guide 5

    7. Using Expressions

  • 8/18/2019 iWay Data Quality Center User's Guide

    58/182

    TypeDescriptionUsageName

    Operand Type:

    BOOLEAN BOOLEAN

    Result Type:

    BOOLEAN

    Logical conjunctiona AND bAND

    Operand Type:

    BOOLEAN

    Result Type:

    BOOLEAN

    Logical negationNOT aNOT

    Operand Type:

    BOOLEAN BOOLEAN

    Result Type:

    BOOLEAN

    Logical suma OR bOR

    Operand Type:

    BOOLEAN BOOLEAN

    Result Type:

    BOOLEAN

    Exclusive ORa XOR bXOR

    Comparison (Relational)

    Operators

    TypeDescriptionUsageName

    Operand Type:

    Any two compatible types

    Result Type:

    BOOLEAN

    Tests if the value of a is less thanb.

    a < b<

    58 iWay Softwar

    Operations and Functions

  • 8/18/2019 iWay Data Quality Center User's Guide

    59/182

    TypeDescriptionUsageName

    Operand Type:

    Any two compatible types

    Result Type:

    BOOLEAN

    Tests if the value of a is less thanor equal to b.

    a

    Operand Type:

    Any two compatible types

    Result Type:

    BOOLEAN

    Tests if the value of a is greaterthan or equal to b.

    a >= b>=

    Set Operations

    For sets, a few basic operations are implemented. Set members are literals of types define

    for columns or column names themselves.

    iWay Data Quality Center User's Guide 5

    7. Using Expressions

  • 8/18/2019 iWay Data Quality Center User's Guide

    60/182

    TypeDescriptionUsageName

    Operand Type:

    Any type, set

    Result Type:

    BOOLEAN

    Tests whether operand a is a member of thespecified set. As opposed to the "is in"

    operation, if operand a is not a member of theset and a null value is a member of the set,then the result is null.

    a in {elem[, elem]...}in

    Operand Type:

    Any type, set

    Result Type:

    BOOLEAN

    Tests whether operand a is a member of thespecified set. Always returns TRUE or FALSE.

    a is in {elem[, elem]...}is in

    Operand Type:

    Any type, set

    Result Type:

    BOOLEAN

    Tests whether operand a is not a member of the specified set.a is not in {elem

    [,elem]...}is not in

    Operand Type:

    Any type, set

    Result Type:

    BOOLEAN

    Tests whether operand a is not a member of the specified set. As opposed to the "is notin" operation, if operand a is not a member of the set and a null value is a member of theset, then the result is null.

    a not in {elem[, elem]...}not in

    Example:

    company IN {"Smith inc.", "Smith Moving inc.",

      "Speedmover inc.", [candidate column], clear_column}

    a IN {1, 2, 5, 10}

    b IN {TRUE, FALSE}

    60 iWay Softwar

    Operations and Functions

  • 8/18/2019 iWay Data Quality Center User's Guide

    61/182

    Other Operations

    TypeDescriptionUsageName

    Operand Type:

    Any two compatible types or null

    Result Type:

    BOOLEAN

    Tests if a is equal to b. Null values areallowed as operands. A typical use is:

    a is null

    a is bis

    Operand Type:

    Any two compatible types or null

    Result Type:

    BOOLEAN

    Tests if a is not equal to b. Null values areallowed as operands. A typical use is:

    a is not null

    a is not bis not

    Date Functions

    In iWay DQC, a date is represented by DAY and DATETIME types. The DAY type representsa date to the detail level of days. DATETIME represents a date to the detail level of milliseconds. The time values that are compatible with each format are described in thefollowing table.

    Included in Date TypeRangeDate Part Name

    DATETIME, DAYAny positive numberYEAR

    DATETIME, DAY1 - 12MONTH

    DATETIME, DAY1 - max.monthDAY

    DATETIME0 - 23HOUR

    DATETIME0 - 59MINUTE

    DATETIME0 - 59SECOND

    A day starts at 00:00:00 and ends at 23:59:59. If a given function requires identificationof a date part as a parameter, the identifier is written in the expression in the form of astring literal, for example, "MONTH". Otherwise, the expression is evaluated as incorrect.Identifiers are case-sensitive and must be written in uppercase.

    iWay Data Quality Center User's Guide 6

    7. Using Expressions

  • 8/18/2019 iWay Data Quality Center User's Guide

    62/182

    Example:

    expression='dateAdd(inDate,10,"DAY")'

    All the listed date parts are represented by positive integers. The date functions do notsupport milliseconds.

    Note: Data type DATE-TYPE represents the date type DAY or DATETIME in the descriptionof input (operand) and output (result) types.

    TypeDescriptionDate Function

    Operand Type:

    DATE-TYPEINTEGERSTRING

    Result Type:

    DATE-TYPE

    Adds the specified srcValue of the type specified by fieldName (YEAR, MONTH, or DAY) to the srcDate. Thisfunction allows subtraction, so the srcValue can benegative. The return value is the result of the add (subtract)operation. If any of the operands are invalid or if an attempt

    is made to add an unsupported fieldName to the date typeDAY (HOUR, MINUTE, or SECOND), then the expressionreports an error.

    dateAdd( srcDate, srcValue, fieldName)

    Operand Type:

    DATE-TYPEDATE-TYPESTRING

    Result Type:

    INTEGER

    Returns the difference between endDate and startDateexpressed in fieldName units. If the result exceeds themaximum range of INTEGER, then the value null is returned.If any of the parameters are invalid, the expression reportsan error.

    A combination of date type DAY and fieldName HOUR,

    MINUTE, SECOND can be used. The value of these fieldsis considered to be 0.

    dateDiff( startDate,endDate, fieldName)

    Operand Type:

    DATE-TYPESTRING

    Result Type:

    INTEGER

    Returns the value of the field fieldName of srcDate. If any of the parameters are invalid, the expression reports anerror. For the fields HOUR, MINUTE, and SECOND set forthe date type DAY, the function returns 0.

    datePart( srcDate,fieldName)

    62 iWay Softwar

    Operations and Functions

  • 8/18/2019 iWay Data Quality Center User's Guide

    63/182

    TypeDescriptionDate Function

    Operand Type:

    DATE-TYPESTRING

    Result Type:

    DATE-TYPE

    Truncates less important parts of the srcDate up to thelevel specified by fieldName. Truncation changes values

    of the fields by the following rules: MONTH and DAY to 1,HOUR, MINUTE, and SECOND to 0.

    The function may be used even for the DAY type with thefieldName HOUR, MINUTE, and SECOND. The function doesnot have an effect on the data. Result and input valuesare the same.

    If any of the parameters are invalid, the expression reportsan error.

    Example: For srcDate 5.5.1980 12:35:10 and fieldNameHOUR, the function returns 5.5.1980 12:00:00.

    dateTrunc( srcDate,fieldName)

    Operand Type:

    DATE-TYPE

    Result Type:

    DAY

    Returns the date in the format defined by the specified srcExpression (type DAY or DATETIME), with the time setto zero (HH:mm:ss:sss).

    getDate( srcExpression)

    Result Type:

    DATETIME

    Returns the time at which processing of the current requeststarted. This is the iWay DQC application start time in batchmode, and the Web service request time in online mode.

    getRequestTime()

    Result Type:

    DATETIME

    Returns the current time with the type DATETIME. Thisfunction always returns the time when it is evaluated, thatis, the current time.

    now()

    Result Type:

    DAY

    Returns the current date in type DAY. This function returnsthe same value for all records (iWay DQC application startdate), even if iWay DQC runs past midnight.

    today()

    String Functions

    The following are common functions used for string processing.

    iWay Data Quality Center User's Guide 6

    7. Using Expressions

  • 8/18/2019 iWay Data Quality Center User's Guide

    64/182

    TypeDescriptionString Function

    Operand Type:

    STRING

    Result Type:

    STRING

    Transforms all words in the string srcStr  in thefollowing manner: the first character of each word

    to uppercase and all following characters tolowercase. A word consists of alphabeticcharacters (letters). All other characters areconsidered separators.

    capitalize( srcStr )

    Operand Type:

    STRINGSTRING[,STRING]...

    Result Type:

    STRING

    Transforms all words in the string srcStr  (with theexception of the words given as the parametersexc) in the following manner: the first characterof each word to uppercase and all followingcharacters to lowercase. A word consists of alphabetic characters (letters). All other

    characters are considered separators.

    capitalizeWithException( srcStr ,exc[, exc]...)

    Operand Type:

    STRINGSTRING

    Result Type:

    BOOLEAN

    Searches for the occurrence of the word srcWordin the string srcStr . Word is a sequence of letterswith no whitespaces. Words in the string aredefined as sequences of letters separated by aspace (' '). Beginning, ending, and multiplespaces are ignored. This function is case-sensitive.

    containsWord( srcStr , srcWord)

    Operand Type:

    STRING

    Result Type:

    INTEGER

    Returns the number of characters is the string

     srcStr  that include diacritical marks.

    countNonAsciiLetters( srcStr )

    Operand Type:

    STRINGSTRINGSTRING

    Result Type:

    INTEGER

    Takes a string as an input wrongly read using theactualCp charset and transforms it into a correctcorrectCp charset. An example is a file that is allin windows-1250 charset except for one column,a, which is in the latin2 charset. This file will beread using the windows-1250 charset. For the

    column named a, the following expression canbe used:

    cpConvert(a, 'windows-1250', 'latin2')

    cpConvert( str, actualCp,correctCp)

    64 iWay Softwar

    Operations and Functions

  • 8/18/2019 iWay Data Quality Center User's Guide

    65/182

    TypeDescriptionString Function

    Operand Type:

    STRING

    Result Type:

    STRING

    Operand Type:

    STRINGSTRING

    Result Type:

    STRING

    Operand Type:

    STRINGSTRINGSTRING[,STRING]...

    Result Type:

    STRING

    Returns a string that contains concatenated partsof the original string srcStr . Repeated parts, or

    parts not listed as srcItem, are omitted. Theparameter srcSeparator  specifies the separatorof the string parts. If srcSeparator  is missing orset to NULL, the space character is theseparator. The listing of parameters in srcItemrestricts the output string parts to the listeditems only. If the string srcStr  is NULL or empty,the function returns NULL.

    distinct( srcStr[, srcSeparator [, srcItem[, srcItem]...]])

    Operand Type:

    STRINGResult Type:

    STRING

    Encodes srcStr  to a double metaphone primary string. It removes accents from the srcStr  before

    evaluating the double metaphone value. See theMetaphone article on Wikipedia, athttp://www.wikipedia.org .

    doubleMetaphone( srcStr )

    Operand Type:

    STRINGTRUE

    Result Type:

    STRING

    Encodes srcStr  to a double metaphone secondary string if the parameter isAlternate is true. Itremoves accents from the srcStr  beforeevaluating the double metaphone value.Otherwise, it returns the primary string. See theMetaphone article on Wikipedia, athttp://www.wikipedia.org 

    .

    doubleMetaphone( srcStr,isAlternate)

    iWay Data Quality Center User's Guide 6

    7. Using Expressions

  • 8/18/2019 iWay Data Quality Center User's Guide

    66/182

    TypeDescriptionString Function

    Operand Type:

    STRINGSTRING

    Result Type:

    INTEGER

    Operand Type:

    STRINGSTRINGBOOLEAN

    Result Type:

    INTEGER

    Returns the edit distance between strings srcStr1and srcStr2. The parameter caseInsensitive

    determines whether case-sensitivity should beconsidered or not. By default, the function iscase-insensitive. The difference betweenLevenshtein and Edit distance lies in thedefinition of distance of two switched adjacentcharacters. Levenshtein considers the switch astwo changes, whereas Edit distance considersthe switch to be one change. If both of the stringsare NULL, then the result is 0. If just one of thestrings is NULL, then the result is the length of the other string.

    editDistance( srcStr1, srcStr2 [,caseInsensitive])

    Operand Type:

    STRINGINTEGERBOOLEAN

    Result Task:

    STRING

    Removes spaces between separate characters(words of length 1) in string srcStr . The pa