176
Leader in Data Quality and Data Integration www.dataflux.com 877–846–FLUX International +44 (0) 1753 272 020 DataFlux Integration Server

DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

Leader in Data Quality and Data Integration

www.dataflux.com 877–846–FLUX

International +44 (0) 1753 272 020

DataFlux Integration Server

Page 2: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

This page is intentionally blank

Page 3: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server

User’s Guide

Version 8.2.1

January 20, 2010

Page 4: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

This page is intentionally blank

Page 5: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide i

DataFlux - Contact and Legal Information

Contact DataFlux

Corporate Headquarters European Headquarters DataFlux Corporation DataFlux UK Limited 940 NW Cary Parkway, Suite 201 59-60 Thames Street Cary, NC 27513-2792 WINDSOR Toll Free Phone: 1-877-846-FLUX (3589) Berkshire Toll Free Fax: 1-877-769-FLUX (3589) SL4 ITX Local Telephone: 1-919-447-3000 United Kingdom Local Fax: 1-919-447-3100 UK (EMEA): +44(0) 1753 272 020 Web: www.dataflux.com

Contact Technical Support

Phone: 919-531-9000 Email: [email protected] Web: http://www.dataflux.com/Resources/DataFlux-Resources/Customer-Care-Portal/Technical-Support.aspx

Legal Information

Copyright © 1997 — 2009 DataFlux Corporation LLC, Cary, NC, USA. All Rights Reserved.

DataFlux and all other DataFlux Corporation LLC product or service names are registered trademarks or trademarks of, or licensed to, DataFlux Corporation LLC in the USA and other countries. ® indicates USA registration.

Apache Portable Runtime License Disclosure

Copyright © 2008 DataFlux Corporation LLC, Cary, NC USA.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Apache/Xerces Copyright Disclosure

The Apache Software License, Version 1.1

Copyright © 1999-2003 The Apache Software Foundation. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Page 6: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

ii DataFlux Integration Server User's Guide

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. The end-user documentation included with the redistribution, if any, must include the following acknowledgment: "This product includes software developed by the Apache Software Foundation (http://www.apache.org/)." Alternately, this acknowledgment may appear in the software itself, if and wherever such third-party acknowledgments normally appear.

4. The names "Xerces" and "Apache Software Foundation" must not be used to endorse or promote products derived from this software without prior written permission. For written permission, please contact [email protected].

5. Products derived from this software may not be called "Apache", nor may "Apache" appear in their name, without prior written permission of the Apache Software Foundation.

THIS SOFTWARE IS PROVIDED "AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

This software consists of voluntary contributions made by many individuals on behalf of the Apache Software Foundation and was originally based on software copyright © 1999, International Business Machines, Inc., http://www.ibm.com. For more information on the Apache Software Foundation, please see http://www.apache.org/.

DataDirect Copyright Disclosure

Portions of this software are copyrighted by DataDirect Technologies Corp., 1991 - 2008.

Expat Copyright Disclosure

Part of the software embedded in this product is Expat software.

Copyright © 1998, 1999, 2000 Thai Open Source Software Center Ltd.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

Page 7: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide iii

LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

gSOAP Copyright Disclosure

Part of the software embedded in this product is gSOAP software.

Portions created by gSOAP are Copyright © 2001-2004 Robert A. van Engelen, Genivia inc. All Rights Reserved.

THE SOFTWARE IN THIS PRODUCT WAS IN PART PROVIDED BY GENIVIA INC AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Microsoft Copyright Disclosure

Microsoft®, Windows, NT, SQL Server, and Access, are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

Oracle Copyright Disclosure

Oracle, JD Edwards, PeopleSoft, and Siebel are registered trademarks of Oracle Corporation and/or its affiliates.

PCRE Copyright Disclosure

A modified version of the open source software PCRE library package, written by Philip Hazel and copyrighted by the University of Cambridge, England, has been used by DataFlux for regular expression support. More information on this library can be found at: ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/.

Copyright © 1997-2005 University of Cambridge. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

• Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

• Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

• Neither the name of the University of Cambridge nor the name of Google Inc. nor the names of their contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF

Page 8: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

iv DataFlux Integration Server User's Guide

SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Red Hat Copyright Disclosure

Red Hat® Enterprise Linux®, and Red Hat Fedora™ are registered trademarks of Red Hat, Inc. in the United States and other countries.

SQLite Copyright Disclosure

The original author of SQLite has dedicated the code to the public domain. Anyone is free to copy, modify, publish, use, compile, sell, or distribute the original SQLite code, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.

Sun Microsystems Copyright Disclosure

Java™ is a trademark of Sun Microsystems, Inc. in the U.S. or other countries.

Tele Atlas North American Copyright Disclosure

Portions © 2006 Tele Atlas North American, Inc. All rights reserved. This material is proprietary and the subject of copyright protection and other intellectual property rights owned by or licensed to Tele Atlas North America, Inc. The use of this material is subject to the terms of a license agreement. You will be held liable for any unauthorized copying or disclosure of this material.

USPS Copyright Disclosure

National ZIP®, ZIP+4®, Delivery Point Barcode Information, DPV, RDI. © United States Postal Service 2005. ZIP Code® and ZIP+4® are registered trademarks of the U.S. Postal Service.

DataFlux holds a non-exclusive license from the United States Postal Service to publish and sell USPS CASS, DPV, and RDI information. This information is confidential and proprietary to the United States Postal Service. The price of these products is neither established, controlled, or approved by the United States Postal Service.

Page 9: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide v

Table of Contents Overview ....................................................................................................... 1

What DIS Does ............................................................................................. 1

Where and How DIS Runs ............................................................................... 1

DataFlux Integration Server 8.2.1 - What's New in This Release .................. 2

Installation Notes .......................................................................................... 2

Microsoft Win64 Considerations ....................................................................... 3

Conventions Used in This Document ................................................................ 3

New Features ................................................................................................ 4

Problems Resolved ........................................................................................ 4

DataFlux Integration Server Usage .................................................................. 4

DIS Configuration Options .............................................................................. 6

System Requirements ................................................................................... 7

Supported Operating Systems ......................................................................... 7

Supported Databases ..................................................................................... 8

Bundled UNIX Drivers .................................................................................... 9

DataFlux Standard Integration Server ........................................................ 12

Key Benefits of Standard Integration Server ................................................... 12

Architecture of Standard Integration Server .................................................... 12

DataFlux Enterprise Integration Server ....................................................... 13

Key Benefits of Enterprise Integration Server .................................................. 13

Architecture of Enterprise Integration Server .................................................. 13

Understanding Enterprise Integration Server Processes .................................... 14

Installing DataFlux Integration Server ........................................................ 16

Installing DIS for Windows ........................................................................... 16

Installing DIS for UNIX/Linux ........................................................................ 17

Page 10: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

vi DataFlux Integration Server User's Guide

Existing ACL Files ........................................................................................ 19

Configuring DataFlux Integration Server .................................................... 20

Configuring a Data Source ........................................................................... 21

Setting up ODBC Connections ....................................................................... 21

Changes to How SAS Data Sets are Accessed Between Versions 8.1.x and 8.2.1 . 23

Configuring Saved Connections ..................................................................... 24

Configuring Licensing .................................................................................. 26

Windows .................................................................................................... 26

UNIX .......................................................................................................... 27

Annual Licensing Notification ......................................................................... 28

Installing Enrichment Data ......................................................................... 29

Downloading and Installing Data Packs .......................................................... 29

Configuring Enrichment Data ........................................................................ 31

Installing Other DataFlux Products ............................................................. 37

Installing dfIntelliserver ............................................................................... 37

Installing dfPower Studio .............................................................................. 37

Installing Quality Knowledge Bases ................................................................ 38

Installing Accelerators .................................................................................. 38

Changing Configuration Settings ................................................................. 39

Windows .................................................................................................... 39

UNIX .......................................................................................................... 40

Windows and UNIX ...................................................................................... 41

Configuring DataFlux Integration Server to Use the Java Plugin ................. 46

Java Runtime Environment ........................................................................... 46

Java Classpath ............................................................................................ 46

Environment Variables ................................................................................. 47

Optional Settings ......................................................................................... 47

Page 11: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide vii

Pre-loading Services ................................................................................... 49

Pre-loading all services ................................................................................ 49

Pre-loading one or more specific services ....................................................... 49

Complex configurations ................................................................................ 49

Multi-threaded Operation ............................................................................ 51

DataFlux Integration Server Connection Manager ....................................... 52

Using Connection Manager on Windows .......................................................... 52

Using Connection Manager on UNIX ............................................................... 52

Sharing Connection Information .................................................................... 53

Connection Manager User Interface ............................................................... 53

DataFlux Integration Server Manager ......................................................... 55

DataFlux Integration Server Manager User Interface ........................................ 55

Using DataFlux Integration Server Manager ............................................... 61

Uploading Batch Jobs and Real-Time Services ................................................. 61

Downloading Batch Jobs and Real-Time Services ............................................. 63

Running and Stopping Jobs ........................................................................... 64

Testing Real-Time Services ........................................................................... 64

Deleting Jobs and Services ........................................................................... 64

Monitoring Job Status .................................................................................. 64

Using Log Files ............................................................................................ 64

Command Line Options ............................................................................... 67

DIS Security Manager Concepts .................................................................. 69

Windows and UNIX ...................................................................................... 69

Security Administration ................................................................................ 72

Security Commands for UNIX ........................................................................ 76

Using Strong Passwords in UNIX ................................................................... 77

Security Policy Planning .............................................................................. 78

Page 12: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

viii DataFlux Integration Server User's Guide

DIS Security Tools ....................................................................................... 79

Using Security Manager............................................................................... 81

Security Manager User Interface ................................................................. 88

Toolbar ...................................................................................................... 93

IP-based Security ........................................................................................ 95

DIS with LDAP Integration .......................................................................... 96

Configuration File ........................................................................................ 99

LDAP Directives ......................................................................................... 101

DIS Security Examples .............................................................................. 103

Frequently Asked Questions ...................................................................... 105

General .................................................................................................... 105

Installation ............................................................................................... 111

Security ................................................................................................... 111

Troubleshooting ........................................................................................ 113

Error Messages .......................................................................................... 116

Installation and Configuration ..................................................................... 116

Security ................................................................................................... 117

Running Jobs and Real-Time Services .......................................................... 118

Appendix A: Best Practices ........................................................................ 121

Appendix B: Code Examples ...................................................................... 123

Java......................................................................................................... 123

C++ ........................................................................................................ 126

C# ........................................................................................................... 129

Appendix C: Saving Profile Reports to a Repository .................................. 134

Appendix D: SOAP Commands ................................................................... 135

SOAP Commands ...................................................................................... 135

Enumeration Values ................................................................................... 135

Page 13: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide ix

Appendix E: DIS Service ............................................................................ 137

Windows .................................................................................................. 137

UNIX ........................................................................................................ 138

Appendix F: Configuration Settings ........................................................... 139

General DIS Configuration Directives ........................................................... 139

DIS Security Related Configuration Directives ............................................... 146

Architect Configuration Directives ................................................................ 149

Data Access Component Directives .............................................................. 155

Glossary .................................................................................................... 161

Page 14: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain
Page 15: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 1

Overview DataFlux® Integration Server (DIS) addresses the challenges of storing consistent, accurate, and reliable data across a network by integrating data quality and data integration business rules throughout your IT environment. Using DIS, you can replicate your business rules for acceptable data across applications and systems, enabling you to build a single, unified view of the enterprise.

What DIS Does DIS is available in two editions—the Standard Integration Server and Enterprise edition. DataFlux Standard Integration Server supports the ability to run batch dfPower® Studio jobs in a client/server environment. DIS also supports the ability to call discrete DataFlux data quality algorithms from numerous native programmatic interfaces, including C, COM, JAVA™, Perl. Standard Integration Server allows any dfPower Studio client user to offload batch dfPower Profile and Architect jobs to a more scalable server environment. This capability enables users by freeing up resources on their local system.

DIS Enterprise edition has added the capability to allow the calling of business services designed in the dfPower Studio client environment, or to invoke batch jobs using service oriented architecture (SOA1

Where and How DIS Runs

).

DIS can be deployed on Microsoft® Windows®, UNIX®, and Linux® platforms with client/server communication using HTTP. dfPower Studio users can select the Run Job Remotely

Also included with DIS is the ability to make

option to have a dfPower client send a job to the standard server.

API2

1Service Oriented Architecture (SOA) enables systems to communicate with the master customer reference database to request or update information.

calls to the same core data quality engine by using the dfIntelliServer® interface. Discrete API calls are available through native programmatic interfaces for data parsing, standardization, match key generation, address verification, geocoding, and other processes. Standard Integration Server requires a developer to code programmatic calls to the engine.

2An application programming interface (API) is a set of routines, data structures, object classes and/or protocols provided by libraries and/or operating system services in order to support the building of applications.

Page 16: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

2 DataFlux Integration Server User's Guide

DataFlux Integration Server 8.2.1 - What's New in This Release Review the following release notes for DataFlux® Integration Server (DIS) 8.2.1 for information about installation, new features, usage, and more. For additional information about this release, please refer to DataFlux dfPower Studio Online Help, What's New in This Release. DataFlux Integration Server supports all features available in the corresponding dfPower release.

Installation Notes

Win 64 Considerations

Conventions Used in This Book

New Features

DataFlux Integration Server Usage

DIS Configuration Options

Installation Notes Once the installation process has been completed, modify the dfexec.cfg file to set directory paths for any relevant reference data (this includes USPS3, Canada Post, Geocoding, and QKB4). Set the default port that the server is listening on, if needed. A valid license file will need to be copied into the license directory. If upgrading from DIS v7.0.x, those files will need to be reconfigured in the 8.2.1 directory structure. The configurations will not be carried forward from previous installations.

Important: Users currently employing DIS Security will have Access Control List (ACL5

3The United States Postal Service (USPS) provides postal services in the United States. The USPS offers address verification and standardization tools.

) files that control access to objects in DIS. These ACL files were located under the security directory in the DIS installation. In DIS v8.1, the location of ACL files has changed and it is necessary to move previous ACL files to a new location. All ACL files must be placed in the .acl subdirectory of the directory that corresponds with the object type, for example:

4The Quality Knowledge Base (QKB) is a collection of files and configuration settings that contain all DataFlux data management algorithms. The QKB is directly editable using dfPower Studio. 5Access control lists (ACLs) are used to secure access to individual DIS objects.

Page 17: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 3

• ACL files with the suffix _archsvc.acl must be moved to the .acl subdirectory under the directory where service files reside.

• ACL files with the suffix _archjob.acl must be moved to the .acl subdirectory under the directory where Architect job files reside.

• ACL files with the suffix _profjob.acl must be moved to the .acl subdirectory under the directory where Profile job files reside.

For more information on installation, see Installing DataFlux Integration Server.

Microsoft Win64 Considerations Starting with v8.1.1, DIS supports 64-bit Microsoft® Windows Vista®, Windows® XP Professional x64 Edition, and 64-bit Windows Server® 2003 operating systems, with the following exceptions and notations:

• Only 64-bit AMD® and Intel® chip sets are supported. Itanium® is not supported.

• International address verification using QAS is not supported.

• US and Canadian address verification (including Geocoding) is supported only through distributed enrichment nodes. This means dfIntelliServer installation is required. This could be a Microsoft Win32 installation on the same platform that is running Win64.

• The only Enrichment node supported is Address Verification (World). All other enrichment activities should be accomplished using dfIntelliServer.

Conventions Used in This Document This reference uses several conventions for special terms and actions. The following variables are used throughout this documentation:

Variable Description

hostname Represents the name used to identify a particular host, and is annotated as [hostname]. For example: [hostname]:port.

servername Represents the name of the server in which DIS is installed, and is annotated as [servername]. For example:

http://[servername]:port/?wsdl

username Represents the name of the user, and is annotated as [username]. For example:

[username]::permissions

version Represents the version of DIS. Appears in file names and directory paths, and is annotated as [version]. For example: \Program Files\DataFlux\DIS\[version]\etc.

Page 18: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

4 DataFlux Integration Server User's Guide

New Features The following new features have been added to DIS for this release:

• Service Name versioned (for Windows)

- Starting with this release, the name of the DIS service will include the product version number. The new name of the DIS service for version 8.2.1 is "DFIntegrationService_v8.2.1".

Service Queuing -

New functionality has been added that allows service requests to be queued by DIS. In earlier versions of DIS, when the maximum number of processes handling real-time service requests (DFWSVCs) is reached and all processes are processing data, any new service request will receive an error message. The error states the request cannot be handled because the server has reached the maximum number of service processes allowed. There is now an option to enable service requests to be queued in the order they are received. When a DFWSVC becomes available, the request will be processed. This configuration parameter, svc requests queue, can be configured in the dfexec.cfg file.

Licensing enhancement

- DIS now authenticates the following values from the SAS® license file: OS, GRACE, RELEASE, and WARN.

Logging

- The ability to log time in milliseconds has been added to DIS.

Macro handling

Problems Resolved

- The DIS server now allows clients to pass macros to a service and to get the final values of those macros.

The following problems have been resolved for this release:

• Unable to run services via WLP on UNIX servers when SHM is used for child connection

• UNIX install breaks if PERL cannot be found

• When DIS is killed, wlpslave does not exit

• Cryptic error messages received when SAS license file cannot be found

• DIS installer does not detect existing install directory

DataFlux Integration Server Usage DIS runs as a Microsoft Windows service (called DataFlux Integration Server). You can start and stop the service using the Microsoft Management Console. DIS uses the dfPower Studio execution environment for Windows (dfexec) to execute real-time services, as well as Architect and Profile jobs.

The following sections summarize some of the more common configuration settings and how they are used. For a complete list of available configuration settings, refer to the DataFlux Integration Server User's Guide.

Page 19: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 5

Note that in the following sections, DFEXEC_HOME is used to represent the root directory of the dfexec installation.

dfexec Configuration Options

The standard configuration options are as follows:

• plugin dir

- This is where dfexec looks for plugin libraries.

license dir

- This is where dfexec looks for license files. You should place the license file (studio.lic) you have received from DataFlux in this directory.

qkb root -

The location of the Quality Knowledge Base (QKB) files. This must be set if you are using steps that depend on the algorithms and reference data in the QKB (such as the matching or parsing steps).

datalib path

- The location of the Verify data libraries. This must be set if your jobs use US address verification.

usps db; canada post db; geo db; world address db -

The locations of the different Verify address databases. The USPS database is required for US address verification; Canada Post database for Canadian address verification; Geo/Phone database for geocoding and coding telephone information; World Address database for address verification outside the US and Canada.

verify cache

- The cache level (0-100) for Verify. The greater the value, the more aggressively the Verify steps cache address data.

enable rdi; enable dpv

- Enable or disable RDI/DPV processing during US address verification. This key should be set to yes or no. The default value is no.

world address license

- License code to unlock the World Address database.

sort chunk - The

amount of memory to use when sorting.

cluster memory

- The amount of memory to use when clustering.

checkpoint

- The amount time between log checkpoints. After this amount of time elapses, messages are printed to the log, which contains the current status of each step in the job currently being executed. The checkpoint timer is then reset. Values can end in s, min, or h; for example, 30min.

mail command

- The command used to mail alerts. The command may contain the substitutions %T (To) and %B (Body). %T is replaced with the destination email address, and %B with the path of a temporary file containing the message body. The default command is mail %T < %B.

arch config

For details on stopping and starting this service, see

- Location of the configuration file containing optional macro values for Architect jobs.

Appendix E: The DIS Server.

Page 20: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

6 DataFlux Integration Server User's Guide

DIS Configuration Options DIS reads configuration options from the configuration file. The installer creates a configuration file (DFEXEC_HOME/etc/dfexec.cfg) with default values for the essential options. This file is in a "key = value" format and can be edited with any text editor.

The standard configuration options are as follows:

• server listen port

- The TCP port number where the server listens for connections.

server read timeout

- The amount of time the server waits to complete read/write operations.

dfsvc max num

- The maximum number of simultaneous dfsvc processes.

dfexec max num

- The maximum number of simultaneous dfexec processes.

working path

- The directory where the server creates its working files and subdirectories.

restrict general access - The server can restrict access to functions by IP address. The value of this option should be the word allow or deny

followed by a list of IP addresses or ranges, with each range or individual address separated with a space. Ranges must be in the low-high format, for example: 192.168.1.1-192.168.1.255.

restrict post/delete access -

LDAP Requirements for UNIX/Linux platforms

If this is set, the server restricts access to connections originating from the listed IP addresses for posting and deleting jobs only. It has the same format as restrict general access.

AIX® - You must have the ldap.client.rte package installed. Run lslpp -l ldap.client.rte to see if it is installed. You can find this package on the installation media for AIX.

HP-UX® - You must have the LDAP-UX client installed. Run /usr/sbin/swlist -l product LdapUxClient to see if it is installed. If you do not have LDAP-UX you can get it from the Hewlett Packard Web site at http://software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=J4269AA.

Linux - You must have the OpenLDAP client installed. On an RPM-based system (such as RedHat® or SuSe™) you can run pm -q openldap to see if it is installed. For other Linux systems, consult your documentation for how to test the availability of software packages. RedHat Enterprise Linux 4 or later also require the compat-openldap package. Run rpm -q compat-openldap to see if it is installed. You can find this package on the installation media or the Red Hat Network.

Solaris®

For details on configuration options, see

- No additional requirements; the LDAP client library is part of the Solaris core libraries.

Configuring DataFlux Integration Server.

Page 21: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 7

System Requirements Supported Operating Systems The following is a list of the minimum requirements for supported platforms for a DataFlux® Integration Server (DIS) installation. The minimum operating system requirements may be different if you are accessing SAS data sets. In some instances, you may be required to run a more recent version of the operating system, as noted in parentheses:

Requirement Minimum Recommended

Platforms [See table below] 1 N/A

Processor [See table below] N/A

Memory (RAM) 1 GB 2 GB per CPU core2 2

Disk Space 1 GB for Installation 1 GB for temp space

10 GB for Installation3 20 GB for temp space

3

1. Other platforms are available - contact

Notes:

DataFlux for a complete list.

2. Actual requirements depend on configuration.

3. Verification processes rely on reference databases to verify and correct address data. The size of these reference databases varies. Check with DataFlux for exact size requirements for this component.

Platform Bits Operating System Hardware

Architecture

AIX® 64 IBM® AIX 5.2 (SAS: 5.3 Technology Level 6 or later; 64-bit environment)

POWER/Power PC®

HP-UX (PA-RISC)

64 HP-UX 11i Version 1.0 (11.11) (SAS: HP–UX 11.23 or later)

PA-RISC 2.0

HP-UX (Itanium)

64 HP-UX 11i Version 2.0 (11.23) (SAS: June 2007 patch bundle)

Itanium® (IA64)

Linux® 32 Linux 2.4 (glibc 2.3) (SAS: Red Hat Enterprise Linux 4 and above; SuSE Linux Enterprise Server 9 or later)

Intel® Pentium® Pro (i686)

Linux 64 Linux 2.4 (glibc 2.3) (SAS: Red Hat Enterprise Linux 4 and above; SuSE Linux Enterprise Server 9 or later)

AMD AMD64 or Intel EM64T

Solaris™ (SPARC)

64 Sun™ Solaris 8 (SAS: Solaris 9 or later with 9/05 update)

sparcv9 (UltraSparc)

Page 22: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

8 DataFlux Integration Server User's Guide

Platform Bits Operating System Hardware Architecture

Solaris x86 64 Sun Solaris 10 (SunOS 5.10) (SAS: Solaris 10 1/06 or later; if using Solaris 10 and LDAP for authentication, then apply patch 118833-27 or later)

AMD AMD64 or Intel EM64T

Win32 32 Microsoft Windows 2003 (NT 5.2) Intel Pentium Pro (i686)

Win64 64 Microsoft Windows 2003 (NT 5.2) AMD AMD64 or Intel EM64T

Linux Notes

DataFlux supports any distribution of Linux which meets the minimum requirements for kernel and glibc versions mentioned above. We do not require a specific distribution like RedHat® or SuSe. Following is a list of some of the more popular distributions and the minimum version of each which meets these requirements and is still supported by the vendor:

• Red Hat® Fedora™: 7.0

• Red Hat Enterprise Linux®: 3.0

• Novell® SuSe® Linux Enterprise Server: 9.0

• Canonical© Ubuntu©: 6.06

Supported Databases The following databases are supported by DIS.

Database Driver

ASCII Text Files TextFile

Pervasive® Btrieve® 6.15 Btrieve

Clipper dBASE File

DB2 Universal Database (UDB) v7.x, v8.1, and v8.2 for Linux, UNIX, and Windows

DB2 Wire Protocol

DB2 UDB v7.x and v8.1 for z/OS DB2 Wire Protocol

DB2 UDB V5R1, V5R2, and V5R3 for iSeries DB2 Wire Protocol

dBASE® IV, V dBASE

Microsoft Excel® Workbook 5.1, 7.0 Excel

FoxPro 2.5, 2.6, 3.0 dBase

FoxPro 6.0 (with 3.0 functionality only) dBase

FoxPro 3.0 Database Container dBase

IBM Informix® Dynamic Server 9.2x, 9.3x, and 9.4x Informix

Page 23: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 9

Database Driver

IBM Informix Dynamic Server 9.2x, 9.3x, and 9.4x Informix Wire Protocol

Microsoft SQL Server 6.5 SQL Server

Microsoft SQL Server 7.0 SQL Server Wire Protocol

Microsoft SQL Server 2000 (including SP 1, 2, 3 and 3a) SQL Server Wire Protocol

Microsoft SQL Server 2000 Desktop Engine (MSDE 2000) SQL Server Wire Protocol

Microsoft SQL Server 2000 Enterprise (64-bit) SQL Server Wire Protocol

Oracle® 8.0.5+ Oracle

Oracle 8i R1, R2, R3 (8.1.5, 8.1.6,8.1.7) Oracle

Oracle 9i R1, R2 (9.0.1, 9.2) Oracle

Oracle 10g R1 (10.1) Oracle

Oracle 8i R2, R3 (8.1.6, 8.1.7) Oracle Wire Protocol

Oracle 9i R1 and R2 (9.0.1 and 9.2) Oracle Wire Protocol

Oracle 10g R1 (10.1) Oracle Wire Protocol

Corel® Paradox® 4, 5, 7, 8, 9, and 10 ParadoxFile

Pervasive PSQL® 7.0, 2000 Btrieve

Progress® OpenEdge® Release 10.0B Progress OpenEdge

Progress 9.1D, 9.1 E Progress SQL92

Sybase® Adaptive Server® 11.5 and higher Sybase Wire Protocol

Sybase Adaptive Server Enterprise 12.0, 12.5, 12.5.1, 12.5.2 and 12.5.3

Sybase Wire Protocol

XML XML

This is a consolidated list of the drivers available for Windows, Linux, and various UNIX platforms. Please consult with DataFlux for a complete and updated database version and platform support list.

Bundled UNIX Drivers The following list shows which UNIX platform bundled drivers are supplied for the specified databases. Please consult with DataFlux for a complete database version and platform support list.

AIX

• DB2 Wire Protocol

• Informix Wire Protocol

• Oracle

Page 24: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

10 DataFlux Integration Server User's Guide

• Oracle Wire Protocol

• SQL Server Wire Protocol

• Sybase Wire Protocol

HP-UX (Itanium)

• DB2 Wire Protocol

• Informix Wire Protocol

• Oracle

• Oracle Wire Protocol

• SQL Server Wire Protocol

• Sybase Wire Protocol

• Teradata

HP-UX (PA-RISC)

3

• DB2 Wire Protocol

• Informix Wire Protocol

• Oracle Wire Protocol

• SQL Server Wire Protocol

• Sybase Wire Protocol

Linux

• DB2 Wire Protocol

• dBase

• FoxPro3

2

• Informix Wire Protocol

2

• OpenEdge

• Oracle Progress

1,2

• Oracle Wire Protocol

• Progress SQL92

• SQL Server Wire Protocol

2

Page 25: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 11

• Sybase Wire Protocol

• Teradata

• Text

1,2

Solaris (SPARC)

2

• DB2 Wire Protocol

• Informix Wire Protocol

• Oracle

• Oracle Wire Protocol

• SQL Server Wire Protocol

• Sybase Wire Protocol

Solaris (x86)

• DB2 Wire Protocol

• Oracle Wire Protocol

• SQL Server Wire Protocol

• Sybase Wire Protocol

Notes

1. requires 5.1 (or newer) drivers

2. 32-bit only

3. requires 5.2 (or newer) drivers

Page 26: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

12 DataFlux Integration Server User's Guide

DataFlux Standard Integration Server DataFlux Standard Integration Server supports native programmatic interfaces for C, C++, COM, Java™, Perl, Microsoft® .NET, and Web services. The API engine runs in its own process as a Microsoft Windows® service or UNIX®/Linux® daemon. The API engine includes both a client installation and a server installation, with communication across the network using Transmission Control Protocol (TCP). If the client and server are installed on the same machine, they may be configured to communicate through inter-process communication (IPC). The Standard Integration Server includes client-side failover support for all API calls.

Key Benefits of Standard Integration Server • Supports the ability to run DataFlux® dfPower® Studio jobs in a client/server mode

by allowing users to offload dfPower Studio jobs onto a higher performance server.

• Exposes core data quality algorithms through programmatic interfaces.

Architecture of Standard Integration Server The following figure depicts integration architecture for the Standard Integration Server.

Page 27: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 13

DataFlux Enterprise Integration Server DataFlux® Enterprise Integration Server offers an innovative approach to data quality that drastically reduces the time required to develop and deploy real-time data quality and data integration services. Through tight integration with the dfPower® Studio design environment, the Enterprise Integration Server operates as a data quality and data integration hub. Both batch and real-time services, which may include database access, data quality, data integration, data enrichment, and other integration processes, can then be called through a service-oriented architecture (SOA6

The Enterprise Integration Server supports real-time deployment using SOA, as well as the ability to run batch dfPower Studio jobs. The batch server capability is the same as the Standard Integration Server, where dfPower Studio clients communicate with the server through HTTP and the clients can process dfPower Profile and Architect jobs in the server environment. Batch jobs may also be instantiated using a Web service call.

). This eliminates the requirement to replicate data quality logic in native programming languages such as Java™ or C. Instead of writing and testing hundreds of lines of code, you can design the integration logic visually and then call from a single Web service interface.

Key Benefits of Enterprise Integration Server • Supports the ability to run dfPower Studio jobs in a client/server mode by allowing users to offload dfPower Studio jobs onto a higher performance server.

• Supports the ability to create data quality and integration processes visually instead of locking the logic into native code.

• Supports a SOA framework, enabling complete reuse of data quality and integration business logic.

Architecture of Enterprise Integration Server The following figure depicts integration architecture for the Enterprise Integration Server.

6Service Oriented Architecture (SOA) enables systems to communicate with the master customer reference database to request or update information.

Page 28: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

14 DataFlux Integration Server User's Guide

Understanding Enterprise Integration Server Processes Activity on the Enterprise Integration Server is split into three general processes:

• Receive SOAP requests

• Monitor registered data quality and data integration services

• Send SOAP responses

The Enterprise Integration Server runs as a daemon on UNIX and Linux or as a service on Microsoft® Windows® platforms. The server is responsible not only for sending and receiving SOAP requests, but also for monitoring the progress of all registered data integration services. Once the server receives a request, the server sends the data to the invoked Web service. If the service has not been invoked before, the server will load the service into memory and send the data to the in-memory processes. If the service invoked from the client application is busy, the server will instantiate a new service into memory and pass the data off to the new service. Each service runs in its own process, which allows for robust error recovery, as well as the ability to spread the processing load across multiple CPUs.

Page 29: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 15

More specifically, the server handles the following processes:

• Query server to return the names of memory services

• Return input/output parameters for a specified service

• Pass data to a service and execute the service

Query Server to Return the Names of Memory Services

If the server receives a query request, the server simply queries the service configuration directory and returns the name of each service. The service names are packaged up into a single SOAP packet and sent back to the client.

Return Input/Output Parameters for a Specified Service

If the client queries the server for the input and output names of a given service, the server will return to the client the names of the expected input fields, as well as the names of the expected output fields.

Pass Data to and Execute a Service

When the server receives a request to process data from a client call, it identifies an idle service, sends the data to the idle service, and listens for additional client requests. If no idle service is identified, the server will load a new service into memory and pass the data to the new service. Since each service runs in its own process, processing multiple services can be spread across multiple CPUs, and the server is always available and listening for additional client requests. The server monitors the service progress; as soon as the service returns output, the server sends the output back to the client application. If the service fails for any reason, the server will terminate the service process and return an error message to the calling application.

Page 30: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

16 DataFlux Integration Server User's Guide

Installing DataFlux Integration Server These sections explain the steps required to install DataFlux® Integration Server (DIS) on Microsoft® Windows® and UNIX®.

Installing DIS for Windows Download the latest version of DIS for Microsoft Windows from the download section of the DataFlux Customer Care portal at http://www.dataflux.com/Customer-Care/.

Installation on a Windows platform requires running the DIS set up program. The set up wizard helps with the installation process. During the installation process, you will be asked to select additional components to install. ODBC Drivers are automatically selected for you. You will be prompted for your licensing method. You may set up licensing now or by using the License Manager after you have completed the installation.

Directory Layout for Windows-based DIS Installations

Directory Description

DIS\[version] Top-level installation directory

DIS\[version]\arch_job Default directory to store Architect jobs

DIS\[version]\bin Executable files for this platform. The wscode executable stored in this directory will provide you with the product and machine codes needed to unlock the full functionality of the product.

DIS\[version]\data Data specific to this installation

DIS\[version]\etc Configuration and license files

DIS\[version]\help Help files

DIS\[version]\log Default directory for log files

DIS\[version]\prof_job Default directory to store Profile jobs

DIS\[version]\sample DataFlux sample database

DIS\[version]\svc_job Default directory to store real-time services

DIS\[version]\temp Default temporary directory for input/output files

DIS\[version]\webclient Default directory for the web client

DIS\[version]\work Default location of working data for running processes

Once you complete the installation, you need to configure the server for your environment. See Configuring DIS for more information.

Page 31: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 17

Note: dfIntelliServer is a separate component that provides a simple, scalable, customizable architecture which allows an organization to integrate DataFlux's powerful data quality technology into its own applications. For information on configuring and using dfIntelliServer, see DataFlux dfIntelliServer Reference Guide

Installing DIS for UNIX/Linux

.

Download the latest version of DIS for UNIX/Linux® from the download section of the DataFlux Customer Care portal at http://www.dataflux.com/Customer-Care/.

This installation includes the dfPower® Architect and Profile execution environment for UNIX systems. It also includes the server components of the DIS. Other components of dfPower are not included.

If you have previously installed DIS in this directory, your dfexec.cfg configuration file and odbc.ini file will be overwritten. If you have made changes to these files and would like to preserve them, save the files to another location before installing DIS.

Follow these instructions to install DIS for UNIX/Linux:

1. Copy the DIS installation and README.txt that corresponds to your operating system (AIX®, HP-UX, Linux, or Solaris™) to an accessible directory.

2. At the command prompt, connect to the location where you are loading DIS.

3. Specify the directory where you will be loading DIS, and navigate to that directory.

Note:

4. Enter the following command to unzip the installation file. Replace PATH_TO in the command with the directory where you copied the installation file:

All files will be installed in a subdirectory called dfPower.

gzip -c -d PATH_TO/dfpower-exec-[version].tar.gz | tar xvf -

Page 32: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

18 DataFlux Integration Server User's Guide

Unzip the DIS installation file

5. Execute the installation program by typing: perl dfpower/install.pl

The installation wizard will now take control of the installation process. Follow the onscreen instructions to complete the installation.

Execute the installation program

Page 33: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 19

Directory Layout for UNIX/Linux-based DIS Installations

Directory Description

dfpower Top-level installation directory

dfpower/bin Executable files for this platform

dfpower/data Data specific to this installation

dfpower/doc This includes documentation, the file dfpower/doc/usage has some basic instructions on the use of the dfexec.cfg file

dfpower/etc Configuration, log, and license files

dfpower/lib Library files for this platform, this is the default location of the Architect plug-in libraries

dfpower/locale Localization files for this platform

dfpower/share Shared (not platform specific) data

dfpower/var Default location of working data for running processes

dfpower/install.pl This is the installer; see Installing DIS to UNIX/Linux for more information

dfpower/README The README.txt file

Existing ACL Files Users of earlier versions of DIS, and who currently employ DIS security, will have Access Control List (ACL) files that control access to the various objects in DIS. These files were formerly located under the security directory in the DIS installation. In version 8.1, the locations for the ACL files have changed, and it is necessary to move your ACL files to a new location. All ACL files should be placed in an .acl subdirectory within the directory for the corresponding object type, as follows:

• All ACL files with the suffix _archsvc.acl must be moved to the .acl subdirectory under the directory where service files reside.

• All ACL files with the suffix _archjob.acl must be moved to the .acl subdirectory under the directory where Architect job files reside.

• All ACL files with the suffix _profjob.acl must be moved to the .acl subdirectory under the directory where Profile job files reside.

Once you complete the installation, you must configure the server for your environment. See Configuring DIS for more information.

Page 34: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

20 DataFlux Integration Server User's Guide

Configuring DataFlux Integration Server This section covers configuring DataFlux® Integration Server (DIS) for Microsoft® Windows® and UNIX® operating systems. See Configuration Settings for a list of options.

To configure server software:

1. Set up database connections

2. Configure licensing

3. Install enrichment data

4. Install other DataFlux products

5. Change configuration settings

6. Configure DIS to use the Java Plugin

7. Start the DIS service

Page 35: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 21

Configuring a Data Source To process a database with DataFlux® Integration Server (DIS), an ODBC7

Setting up ODBC Connections

driver for the specified database must be installed, and the database must be configured as an ODBC data source. You can also access flat files and text files outside of the ODBC configuration method if your dfPower® Architect or Profile job has specific nodes for those data sources.

Best Practice: Refer to Appendix A: Best Practices - Use a System data source rather than a User data source for additional information about Configuration Settings.

Windows

To process a database in Architect, an ODBC driver for the specified database management system (DBMS) must be installed, and the database must be configured as an ODBC data source. To add a data source, use the ODBC Data Source Administrator provided with Microsoft® Windows®.

ODBC Data Source Administrator Dialog

7Open Database Connectivity (ODBC) is an open standard application programming interface (API) for accessing databases.

Page 36: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

22 DataFlux Integration Server User's Guide

To set up a new ODBC connection:

1. Click Start > Settings > Control Panel

2. Double-click

.

Administrative Tools > Data Sources (ODBC)

3. In the

.

ODBC Data Source Administrator

4. Click

dialog, select the driver that is appropriate for your data source.

Add

5. In the

.

ODBC Driver Setup dialog, enter the Data Source Name, Description, and Database Directory

6. Select the

. These values are required, and can be obtained from your database administrator.

Database Type

If these steps have been completed successfully, the database name will display in the database list found on the Connection Manager main screen in Windows.

.

UNIX

Use the interactive ODBC Configuration Tool (dfdbconf) to add new data sources to the ODBC configuration.

1. From the root directory of the dfexec.cfg installation, run: ./bin/dfdbconf

2. Select

.

A.

3. Select a template for the new data source by choosing a number from the list of available drivers.

to add a data source. You can also use dfdbconf to delete a data source if it is no longer needed.

4. You are prompted to set the appropriate parameters for that driver. The new data source is then added to your odbc.ini file.

Once you have added all of your data sources, the interactive ODBC Viewer (dfdbview) application can be used to test your connection. For example, if you added a data source called my_oracle, run: ./bin/dfdbview my_oracle (from the installation root) to test the connection. You may be prompted for a user name and password. If the connection succeeds, you will see a prompt from which you can enter SQL commands and query the database. If the connection fails, DIS displays error messages describing one or more reasons for the failure.

Note: When configuring the new data source, it is critical that the parameters (such as DSN, host, port, and sid) match exactly those used to create the job on the client machine.

Page 37: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 23

Changes to How SAS Data Sets are Accessed Between Versions 8.1.x and 8.2.1 In v8.1.1, if you wanted to run a job using a SAS data set on an Integration Server you would modify the primarypath parameter in the connect string (DSN in the Advanced Properties) for the SAS Data Set input node or the SAS Data Set Target (Insert) output node. The primarypath could either be hard-coded to point to the appropriate directory where the data sets are located on the DIS, or it could be set as a macro and read from the architect.cfg file. For example, a common connect string in v8.1.1 would look like this:

DRIVER=BASE;CATALOG=BASE;SCHEMA=(name='SAS';primarypath='C:\dfHome\demodata\sasdata')

To modify the job to run on a UNIX host, the connect string had to be changed to look something like this:

DRIVER=BASE;CATALOG=BASE;SCHEMA=(name='SAS';primarypath='/dfhome/demodata/sasdata’)

Or this:

where the macro variable

DRIVER=BASE;CATALOG=BASE;SCHEMA=(name='SAS';primarypath='%%path_to_datasets%%')

%%path_to_datasets%% is defined as /dfhome/demodata/sasdata

This has changed for version 8.2.1. The connect string is no longer stored in the node. Instead, it is stored in a configuration file that is referenced by the node. For example, the DSN property of the input and output nodes now look something like this:

in the architect.cfg file on a UNIX Integration Server.

At this point, the job references the connection configuration file located in the

DSN=SAS tables;DFXTYPE=TKTS

dftkdsn directory in the etc directory of the DIS installation. In this example the file name is SAS tables.dftk. The configuration file should look something like this:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <datafluxdocument class="dftkdsn" version="1.0"> <name>SAS tables</name> <description>SAS data sets</description> <attributes> <attribute name="DRIVER">BASE</attribute> <attribute name="CATALOG">BASE</attribute> <attribute name="SCHEMA">(name='SAS';primarypath='C:\dfhome\demodata\sasdata';LOCKTABLE=SHARE)</attribute> <attributes> <datafluxdocument>

Page 38: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

24 DataFlux Integration Server User's Guide

To modify the job to run on a UNIX host, complete these two steps:

1. Copy the SAS tables.dftk

2. Modify the

from the dfPower client where the connection was created to the Integration Server where the job will be run, placing it in the location mentioned above.

primarypath in the file so that it points to the correct location for the SAS data sets, like this: <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <datafluxdocument class="dftkdsn" version="1.0"> <name>SAS tables</name> <description>SAS data sets</description> <attributes> <attribute name="DRIVER">BASE</attribute> <attribute name="CATALOG">BASE</attribute> <attribute name="SCHEMA">(name='SAS';primarypath='/dfhome/demodata/sasdata';LOCKTABLE=SHARE)</attribute> <attributes>

<datafluxdocument>

Configuring Saved Connections After you configure the data sources and test the connection, you should store a saved connection for that data source. Saved connections provide a mechanism for storing encrypted authentication information (user name/password combinations) for a data source. When a saved connection is used only, the DSN8

Windows

is stored in the job file, not the entire connection string. When the job is executed it refers to the connection file for the authentication information for that DSN. In order to use a saved connection, the same connection must also be saved on the client machine where the job was created.

When configuring a Microsoft SQL Server® connection, it is recommended that you use SQL Server authentication rather than Windows authentication.

Note:

8A data source name (DSN) contains connection information, such as user name and password, to connect through a database through an ODBC driver.

When configuring the new data source, ensure the parameters (such as DSN, host, port, and sid) match exactly those used to create the job on the client machine.

Page 39: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 25

The Connection Manager used to administer the saved connections comes installed in the DIS program group. When you run Connection Manager you will see a list of all available connections that have been configured using ODBC. When you select a connection and click Save

• HKEY_CURRENT_USER\Software\DataFlux Corporation\dac\[version]\savedconnectiondir

, you are prompted for the user name and password. If the connection is successful, the logon information is saved in a file. The user name and password are encrypted. To enable users to share the same connection information, modify one or both of the following registry keys and enter a valid directory path:

• HKEY_LOCAL_MACHINE\Software\DataFlux Corporation\dac\[version]\savedconnectiondir

where [version] indicates the version of DIS that you have installed.

If this key does not contain an entry, the connection file is stored in a dfdac subdirectory under the user's home directory. For example, if the user qatest stored a connection for a data source named mydatasource, it would reside in this file:

c:\Documents and Settings\qatest\dfdac\mydatasource

Best Practice: Refer to Appendix A: Best Practices - Use Connection Manager to Configure Data Sources for additional information about Configuration Settings.

UNIX

To create a saved connection, from the root directory of the DIS installation run:

The information is saved in the user's home directory, within the .dfpower directory. The user ID and password are encrypted.

./bin/dfdbview —s —t

For more information regarding dfdbview, see the usage.ODBC file in the /doc directory of the DIS installation. For more information on configuring ODBC data sources, see the ODBC Reference document that accompanies the dfPower Studio installation (click Start > Programs > DataFlux dfPower Studio > Help > ODBC Reference).

Page 40: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

26 DataFlux Integration Server User's Guide

Configuring Licensing Windows DataFlux® Integration Server (DIS) uses a file-based licensing model that takes the form of a machine-specific license file. The license pool for executing jobs and services using DIS has uncounted licenses (an infinite number of licenses) for each type of licenses purchased. If DIS is packaged as part of SAS, you have the option of selecting SAS license file as your licensing method.

Note:

To configure your license for DIS, do the following:

The license dir parameter in the dfexec.cfg file is no longer supported on DIS for Microsoft® Windows®. In order to set or change the license location, you must use the license manager application.

1. Run the DataFlux Host ID application to generate a Host ID for your Integration Server. From the dfPower® Studio main menu, click Help > DataFlux Host ID.

DataFlux Host ID application

2. Contact your DataFlux representative and provide the DataFlux Host ID to obtain your license file.

3. Save the license file to [installation drive]:\Program Files\DataFlux\DIS\[version]\etc\[license file]

4. Make note of the full path to the licensing location, including the file name. To specify the licensing location by using the

.

License Manager, click Start > Programs > DataFlux Integration Server > License Manager. In the License Manager dialog, select DataFlux license file, and enter the Location.

Page 41: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 27

DataFlux License Manager Dialog

UNIX DIS uses a file-based licensing model that takes the form of a machine-specific license file. The license pool for executing jobs and services using DIS has uncounted licenses (an infinite number of licenses) for each type of license purchased.

To configure your license file for DIS, do the following:

1. To generate a Host ID, run ./bin/lmhostid

2. Log onto the Customer Care Portal at

. Write down the FLEXnet host ID that is returned.

http://www.dataflux.com/Customer-Care and click Request License Unlock Codes

3. Enter the requested information, including the Host ID generated in Step 1.

. This opens the License Request Form page.

4. When you receive your new license file, save it on the UNIX® server in the etc/license directory. License files must have a .lic file name extension in order to be considered.

With file-based licensing, you should not change the license location setting in the dfexec.cfg configuration file.

If you need to change the licensing method, run ./bin/dflm. The optional -m

SAS License

switch allows you to change licensing methods. If you use this switch, you must restart DIS.

If you have obtained a license from SAS, complete these steps:

1. Set the license location setting in the dfexec.cfg configuration file to point to your license file.

2. Run ./bin/dflm -m

3. Set the license type to

.

SAS license file.

Page 42: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

28 DataFlux Integration Server User's Guide

Annual Licensing Notification DIS uses an annual license process to allow users to access services and run jobs. The system alerts the user when each feature's license is nearing expiration, using the following process:

1. Sixty days before a license is due to expire, a dialog will begin appearing daily in the Integration Server Manager. It contains a list of the licensed features that are expiring as well as the number of days remaining for each feature's license. To stop the dialog from reappearing, click Do not display this warning again

2. When a license reaches its expiration date, another dialog begins displaying daily, alerting the user that one or more features have expired, and that these features are now operating within a thirty-day grace period. The dialog lists the number of days left within the grace period for each feature, or if a feature has already expired and can no longer be accessed. This dialog cannot be disabled; it will continue to appear daily.

.

3. After the thirty-day grace period, services or jobs that are requested through DIS, but have expired, no longer run.

The server log files keep records of all notification warnings generated.

Contact your DataFlux sales executive to renew your DataFlux product licenses.

Page 43: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 29

Installing Enrichment Data If you are using external data, install USPS9, Software Evaluation and Recognition Program (SERP10), Geocode/Phone, QuickAddress Software (QAS11

Downloading and Installing Data Packs

), World, or other enrichment data. Make a note of the path to each data source. You will need this information to update the dfexec.cfg configuration file.

If your DataFlux® dfPower® Studio installation includes a Verify license, you need to install the proper USPS, Canada Post, and Geocode databases to do address verification. If you are licensed to use QAS, you must acquire the postal reference databases directly from QAS for the countries they support. For more information, contact your DataFlux representative.

Data Packs for data enrichment are available for download on the DataFlux Customer Care portal at http://www.dataflux.com/Customer-Care. To download data packs, follow these steps:

1. Obtain a user name and password from your DataFlux representative.

2. Log in to the DataFlux Customer Portal.

Note:

3. Click

You may also retrieve the data pack installation files through FTP. Please contact DataFlux Technical Support at 919-531-9000 for more information regarding downloading through FTP.

Downloads > Data Updates

9The United States Postal Service (USPS) provides postal services in the United States. The USPS offers address verification and standardization tools.

.

10The Software Evaluation and Recognition Program (SERP) is a program the Canadian Post administers to certify address verification software. 11QuickAddress Software (QAS) is used to verify and standardize US addresses at the point of entry. Verification is based on the latest USPS address data file.

Page 44: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

30 DataFlux Integration Server User's Guide

Data Updates Page

4. Select the installation file corresponding to your data pack and operating system to download.

Close all other applications and follow the procedure that is appropriate for your operating system.

Windows

Browse to and double-click the installation file to begin the installation wizard. If you are installing QAS data, you must enter a license key. When the wizard prompts you for a license key, enter your key for the locale you are installing.

UNIX

Installation notes accompany the download for each of the UNIX® data packs from DataFlux. For Platon and USPS data, check with the vendor for more information.

Note: Be sure to select a location to which you have write access and which has at least 430 MB of available space.

Note: Download links are also available from the dfPower Navigator Customer Portal link in dfPower Studio.

Page 45: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 31

Configuring Enrichment Data If you are using external data, install USPS, SERP, Geocode/Phone, QAS, World, or other enrichment data. You will need to specify the path to each data source in your configuration file.

Configuring USPS

Windows

Download Windows Verify Data Setup

UNIX

from the DataFlux Customer Portal, and run the installation file.

Download UNIX Verify Data Setup

from the DataFlux Customer Portal and install the file on your DIS machine.

Setting Description

usps db This is the path to the USPS database, which is required for US address verification (Architect batch jobs and real-time services).

# Windows Example usps db = C:\Program Files\DataFlux\verify\uspsdata # UNIX Example usps db = /opt/dataflux/verify/uspsdata

Configuring DPV

Windows

Download Windows Verify DPV Data Setup from the DataFlux Customer Portal, and run the installation file. Enable DPV by changing the enable dpv

UNIX

setting in the dfexec.cfg file.

Download UNIX Verify DPV Data Setup, under USPS in the Data Updates section of the customer portal. Enable DPV by changing the enable dpv

setting in the dfexec.cfg file.

Page 46: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

32 DataFlux Integration Server User's Guide

Setting Description

enable dpv

To enable Delivery Point Validation (DPV12) processing (for US Address Verification), set to yes

# Windows or UNIX Example

. It is disabled by default (Architect batch jobs and real-time services).

enable dpv = yes

Configuring USPS eLOT

Windows

Download Windows Verify eLOT Data Setup from the DataFlux Customer Portal, and run the installation file. Enable eLOT by changing the enable elot

UNIX

setting in the dfexec.cfg file.

Download UNIX Verify eLOT Data Setup, under USPS in the Data Updates section of the customer portal. Enable eLOT by changing the enable elot

setting in the dfexec.cfg file.

Setting Description

enable elot

To enable USPS eLOT processing (for US Address Verification), set to yes

# Windows or UNIX Example

. It is disabled by default (Architect batch jobs and real-time services).

enable elot = yes

Configuring Canada Post (SERP)

Windows

Download the Microsoft® Windows® SERP

UNIX

data update from the DataFlux Customer Portal and install the file on your DIS machine.

Download the SERP

12Delivery Point Validation (DPV) is a USPS database that checks the validity of residential and commercial addresses.

data update that corresponds to your operating system from the DataFlux Customer Portal and install the file on your DIS machine.

Page 47: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 33

Setting Description

canada post db

This setting indicates the path to the Canada Post database for Canadian address verification (Architect batch jobs and real-time services).

# Windows Example canada post db = C:\Program Files\DataFlux\dfPower Studio\[version]\mgmtrsrc\RefSrc\SERPData # UNIX Example canada post db = /opt/dataflux/aix/dfpower/[version]/mgmtrsrc/refsrc/serpdata

Configuring Geocode/Phone

Windows

Download the Windows Geocode Data Pack

UNIX

from the DataFlux Customer Portal and install the file on your DIS machine.

Download the UNIX Geocode Data Pack

from the DataFlux Customer Portal and install the file on your DIS machine.

Setting

Description

geo db This sets the path to the database for geocoding and coding telephone information (Architect batch jobs and real-time services).

# Windows Example geo db = C:\Program Files\DataFlux\dfPower Studio\[version]\mgmtrsrc\RefSrc\GeoPhoneData # UNIX Example geo db = /opt/dataflux/hpux/dfpower/[version]/mgmtrsrc/fresrc/geophonedata

Configuring QAS Data

Windows

Contact QAS to download the latest data files for the countries you are interested in. Once you have downloaded the data sets, run the installation file and follow the instructions provided by the installation wizard.

UNIX

Run the installation file on a Windows machine to get the .dts, .tpx, and .zls files, then transfer all of these to your UNIX environment.

Page 48: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

34 DataFlux Integration Server User's Guide

Configure the following QAS files located in the /etc subdirectory of your DIS directory:

• In the qalicn.ini

• In the

file, copy your license key for the specific country. Each license key must be entered on a separate line.

qaworld.ini

1. Set the value of the

file, you must specify the following information:

CountryBase

CountryBase=AUS

parameter equal to one or more country prefixes for the countries you have installed. For example, to search using Australian mappings, add the following line to your qaworld.ini file:

Additional country prefixes can be added to the CountryBase parameter. Separate each prefix by a space. For a complete list of supported countries, see the International Address Data lists at the QAS website.

2. Set the value of the InputLineCount

AUSInputLineCount=4

parameter. Add the country prefix to the parameter name and set the count equal to the number of lines your input addresses contain. For example, to define four lines for Australia:

3. Set the value of the AddressLineCount parameter. Add the country prefix to

the parameter name and set the count equal to the total number of lines. Then, specify which address element will appear on which line in the input address by setting the value of the AddressLine

AUSAddressLineCount=4

parameter equal to a comma-separated list of element codes. For example:

AUSAddressLine1=W60 AUSAddressLine2=W60 AUSAddressLine3=W60 AUSAddressLine4=W60,L21

For more information on address elements and configuring the qaworld.ini file, see QuickAddress Batch API Guide

• In the

and the country-specific data guides.

qawserve.ini file, you must specify the following information for each parameter. If more than one country prefix is added to the parameter, each subsequent country prefix should be typed on a new line and preceded by a + (plus sign). For a complete list of supported countries, see the International Address Data lists at the QAS website.

1. Set the value of the DataMappings

DataMappings=AUS,Australia,AUS

parameter equal to the country prefix, country name, and country prefix. Separate each value by a comma. For example:

2. Set the value of the InstalledData

InstalledData=AUS,C:\Program Files\QAS\Aus\

parameter equal to the country prefix and installation path. Separate each value by a comma. For example:

Page 49: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 35

For more information on configuring the qawserve.ini file, see QuickAddress Batch API Guide and the country-specific data guides.

Note:

Configuring AddressDoctor Data

If you have existing Architect jobs that include the Address Verification (QAS) node, your jobs will not work. You must reconfigure your existing jobs to work with the new QAS 6.x engine.

Windows and UNIX

If you are using AddressDoctor data for address verification, download the address files for the countries you are interested in from the DataFlux Customer Care portal. You will also need the addressformat.cfg file included with the data files. The addressformat.cfg file must be installed in the directory where the address data files reside.

Change the world address license and world address database settings in the dfexec.cfg file:

Setting Description

world address license

This is the license key provided by DataFlux that is used to unlock the AddressDoctor country data. The value must be enclosed in single quotes (Architect batch jobs and real-time services).

# Example (same for Windows and Unix) world address license = 'abcdefghijklmnop123456789'

world address db

This sets the path to where the AddressDoctor data is stored.

# Windows Example world address db= 'C:\world_data\' # UNIX Example world address db= '/opt/dataflux/linux/worlddata'

Configuring LACS and RDI Data

Windows and UNIX

Residential Delivery Indicator (RDI) and Locatable Address Conversion System (LACS) are provided by the United States Postal Service®. If you are using these products, simply download the data with your USPS data, and set the applicable settings in the dfexec.cfg file:

Page 50: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

36 DataFlux Integration Server User's Guide

Setting Description

enable lacs

To enable LACS processing, set to yes

# Windows or UNIX Example

. It is disabled by default (Architect batch jobs and real-time services).

enable lacs = yes

enable rdi

This option enables or disables RDI13 processing (for US Address Verification). By default, it is set to no

# Windows or UNIX Example

(Architect batch jobs and real-time services).

enable rdi = yes

13Residential Delivery Indicator (RDI) identifies addresses as residential or commercial.

Page 51: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 37

Installing Other DataFlux Products DataFlux® is a leader in data quality and data integration. The data cleansing and data quality suite of applications encompassed by dfPower® Studio can be integrated into the service-oriented architecture of DataFlux Integration Server (DIS). This architecture can be customized to your own environment using applications like dfIntelliServer, Quality Knowledge Bases, and Accelerators. DataFlux Accelerators provide data and workflows to put common data quality initiatives to work in your organization.

Installing dfIntelliserver dfIntelliServer is a separate component that provides a simple, scalable, customizable architecture that allows an organization to integrate the DataFlux powerful data quality technology into its own applications. To install dfIntelliServer:

1. From the DataFlux Customer Care portal (http://www.dataflux.com/Customer-Care/), click Downloads

2. Scroll down to

.

dfIntelliServer

3. Install dfIntelliServer.

, select the version corresponding to your operating system, and download to your computer.

Double-click the installation file and follow the on-screen instructions.

Windows

Download the tar.gz file and follow the associated

UNIX

Installation Notes

For information on configuring and using dfIntelliServer, see

.

DataFlux dfIntelliServer Reference Guide

Installing dfPower Studio

.

dfPower Studio is a powerful suite of data cleansing and data integration applications. With dfPower Studio, you have access to various applications that can help eliminate data quality problems. dfPower Studio connects to virtually any ODBC database and can be run from an intuitive graphical user interface, from the command line, or in batch operation mode. This gives you flexibility in how your enterprise handles your data quality problems.

Page 52: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

38 DataFlux Integration Server User's Guide

Windows

dfPower Studio is supported on the Microsoft® Windows® platform. To install dfPower Studio, navigate to the DataFlux Customer Portal to download the software.

For information on installing, configuring, and using dfPower Studio, see DataFlux dfPower Studio Getting Started Guide and

Installing Quality Knowledge Bases

DataFlux dfPower Studio online Help.

A Quality Knowledge Base (QKB14

1. From the DataFlux Customer Care portal at

) is a collection of files that define rules, criteria, and data by which data cleansing can be performed. To install the latest version of the Contact Information QKB:

http://www.dataflux.com/Customer-Care/, click QKBs under Downloads

2. Select the version corresponding to your operating system, and download to your computer.

.

3. Install the QKB according to the operating system you are using.

Double-click the installation file and follow the on-screen instructions.

Windows

Download the tar.gz file and follow the associated

UNIX

Installation Notes

For information on configuring and using Quality Knowledge Bases, see

.

DataFlux Quality Knowledge Base Reference Guide

Installing Accelerators

.

DataFlux Accelerators provide a wide range of pre-built workflows that encompass typical data quality processes. You also get the tools necessary to effectively diagnose and manage data quality over time. There are a number of DataFlux Accelerators available. Contact your DataFlux representative for more information.

For more on configuring and using Accelerators, see DataFlux Accelerator Installation Guide

14The Quality Knowledge Base (QKB) is a collection of files and configuration settings that contain all DataFlux data management algorithms. The QKB is directly editable using dfPower Studio.

.

Page 53: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 39

Changing Configuration Settings Once you have completed the installation process, modify the dfexec.cfg file to set the directory paths for any relevant reference data, for example, United States Postal Service (USPS15), Canada Post, Geocoding, and Quality Knowledge Base (QKB16

Windows

). You can also change the default port on which the server is listening. Other settings in the dfexec.cfg file control memory allocation and enhance clustering performance.

Modifying Default Configuration Settings

After installing DataFlux® Integration Server (DIS), you must modify some default configuration settings in order for the jobs and services to run correctly. The dfexec.cfg file contains configuration settings for real-time services, as well as dfPower® Architect and Profile jobs. This file is stored in the \etc directory

Refer to

of the DIS installation.

DIS Configuration Settings for a list of common settings that may need to be modified before running the server. After making changes to the configuration file, you must restart the server. For more information, see the DIS Server.

Note:

Using the Architect Configuration File to Define Macros in Windows

There is an order of precedence for configuration settings. In general, first a setting is determined by the Advanced Properties of a node in the job or real-time service. In the absence of a setting, the value is set by the corresponding entry in the Architect configuration file. If there is no specific setting, DIS then obtains the setting from the dfexec.cfg file. If the value has not been set, DIS will use the default value.

The Architect configuration file (architect.cfg) defines macro values for substitution into Architect jobs, and overrides predefined values. This file is located in the \etc directory of the DIS installation. Each line represents a macro value in the form key = VALUE, where the key is the macro name and VALUE is its value. For example:

This entry sets the macro value INPUT_FILE_PATH to the specified path. This macro is useful when you are porting jobs from one machine to another, because the paths to an input file in different platforms may not be the same. By using a macro to define the input file name you do not need to change the path to the file in the Architect job after you port the job to UNIX®. Add the macro in both the Windows and UNIX versions of the Architect configuration file, and set the path appropriately in each.

INPUT_FILE_PATH = C:\files\inputfile.txt

For more information on macros, refer to the dfPower Studio online Help topic, dfPower Architect - Using Macros

15The United States Postal Service (USPS) provides postal services in the United States. The USPS offers address verification and standardization tools.

.

16The Quality Knowledge Base (QKB) is a collection of files and configuration settings that contain all DataFlux data management algorithms. The QKB is directly editable using dfPower Studio.

Page 54: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

40 DataFlux Integration Server User's Guide

Installing Supplemental Language Support

If you plan to use DIS for data that includes East Asian languages or right-to-left languages, you must install additional language support. To install these packages:

1. Click Start > Settings > Control Panel

2. Double-click

.

Regional and Language Options

3. In the Regional and Language Optio

.

ns dialog, select the Languages

4. Check the boxes marked

tab.

Install files for complex script and right-to-left languages (including Thai) and Install files for East Asian languages, found under Supplemental Language Support

5. The Microsoft® Windows® installer will guide you through the installation of these languages packages.

.

UNIX

Modifying Default Configuration Settings in UNIX/Linux

After installing DIS, you need to modify some default configuration settings in order for the jobs and services to run correctly. The dfexec.cfg file contains configuration settings for real-time services as well as Architect and Profile jobs. This file is stored in the /etc directory of the DIS installation.

Refer to DIS Configuration Settings for a list of common settings that may need to be modified before running the server. After making changes to the configuration file, you must restart the server (see the DIS Server).

Note:

Using the Architect Configuration File to Define Macros in UNIX/Linux

There is an order of precedence for configuration settings. In general, a setting will be determined by (1) the Advanced Properties of a node in the job or real-time service. In the absence of a setting, the value will be set by the corresponding entry in (2) the Architect configuration file. If there is no specific setting, DIS will obtain the setting from (3) the dfexec.cfg file. If the value has not been set, DIS will use (4) the default value.

The Architect configuration file (architect.cfg) defines macro values for substitution into Architect jobs, and overrides predefined values. This file is located in the /etc directory of the DIS installation. Each line represents a macro value in the form key = VALUE, where key is the macro name and VALUE is its value. For example:

INPUT_FILE_PATH = /home/dfuser/files/inputfile.txt

Page 55: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 41

This entry sets the macro value INPUT_FILE_PATH to the specified path. This macro is useful when you are porting jobs from Windows to UNIX®, because the paths to an input file in those two environments would probably not be the same. By using a macro to define the input file name you do not need to change the path to the file in the Architect job after you port the job to UNIX. Simply add the macro in both the Windows and UNIX versions of the Architect configuration file and set the path in each.

For more information on macros, refer to the dfPower Studio online Help, dfPower Architect - Using Macros

Windows and UNIX

.

Processing Power and Memory Allocation

There are several configuration settings in the dfexec.cfg file that affect system performance. Specifically, the following settings relate to processing power and memory allocation:

Memory Allocation

Setting Description

sort chunk

This setting allows you to specify the amount of memory to use while performing sorting operations. Memory may be specified in KB or MB, but not GB (Architect batch jobs and real-time services).

# Windows or UNIX Example sort chunk = 128MB

working path

This is the path where the server creates its working files and subdirectories. The default directory is the Integration Server /var directory. The value must be enclosed in single quotes. The location of the working path can affect system performance.

# Windows Example working path = 'C:\Program Files\DataFlux\DIS\[version]\var' # UNIX Example working path = '/opt/dataflux/solaris/dis/[version]/var'

sort bytes - There is an order of precedence for setting memory for sorting in nodes that perform this operation. All settings should be in bytes except in the dfexec.cfg file, where KB or MB can be explicitly specified. The order of precedence works as follows: If property 1 is not set or left as NULL, the VALUE is taken from property 2. When running a job or service using DIS, the value in the dfexec.cfg file is used if the previous two locations do not have the value set.

Page 56: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

42 DataFlux Integration Server User's Guide

For Surviving Record Identification (SRI) node:

1. SRI step in Architect - Select Advanced property >

2. architect.cfg -

MEMSIZE_MBYTES

3. dfexec.cfg -

SORTBYTES

For Sort node:

SORT CHUNK

1. Sort step in Architect - Select Advanced property >

2. architect.cfg -

CHUNKSIZE

3. dfexec.cfg -

SORTBYTES

SORT CHUNK

Note: In Architect the sort option in the Tools > Options > Step Specific

It is recommended that this memory allocation parameter be set to 75-80% of total physical RAM to take advantage of the clustering performance in dfPower Studio.

dialog refers to the same sort value in the architect.cfg file. Setting it here or directly by editing the architect.cfg file accomplishes the same task.

Pre-loading and Clustering

Setting Description

cluster memory

The cluster memory is the amount of memory to use per cluster of match-coded data. Use this setting if you are using clustering nodes in dfPower (Architect batch jobs and real-time services). This setting can affect memory allocation.

Note:

# Windows or UNIX Example

This setting must be recorded in megabytes. For example, 1 GB should be set to 1024 MB.

cluster memory = 64MB

verify cache

This number indicates an approximate percentage (0 - 100) of the USPS reference data set that is cached in memory prior to an address verification procedure (Architect batch jobs and real-time services). This setting can affect memory allocation.

# Windows or UNIX Example verify cache = 30

verify preload

This option allows you to specify a list of states for which address data is preloaded. Pre-loading causes an increase in memory usage but can significantly decrease the time required to verify addresses in that state (Architect batch jobs and real-time services). This setting can affect memory allocation.

# Windows or UNIX Examples verify preload = NY TX CA FL verify preload = ALL

Page 57: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 43

verify cache and verify preload - If verify cache is not set using dfPower, DIS uses the value set in the dfexec.cfg file. The verify cache variable indicates an approximate percentage of how much of the United States Postal Service (USPS17) reference data set will be cached in memory prior to an address verification procedure. The verify preload setting allows you to specify a list of states whose address data will be preloaded. Pre-loading will cause an increase in memory usage but can significantly decrease the time required to verify addresses in that state.

cluster memory - The DataFlux clustering engine allows developers to increase Architect job efficiency by grouping similar data items together based upon the information defined in the Quality Knowledge Base (QKB18). The cluster memory

There is an order of precedence for setting memory for clustering and sorting in nodes that perform clustering operations. All settings should be in bytes. The default setting for clustering memory allocation is 67108864 bytes (64MB).

is the amount of memory to use per cluster of match-coded data. Use this setting if you are using clustering nodes in dfPower.

The order of precedence is as follows: If property 1 is not set or is left as NULL then the value is taken from property 2. In the batch and real-time clustering nodes, there is an Override clustering memory size

For clustering and clustering update nodes:

option. Choosing this to override the clustering value is the same as setting the advanced property listed below:

1. Clustering/Cluster Update step in Architect - Select Advanced property > CLUSTER/BYTES

2. architect.cfg -

.

3. dfexec.cfg -

CLUSTER/BYTES

For exclusive real time clustering and concurrent real time clustering nodes:

CLUSTER MEMORY

1. Exclusive real time clustering/concurrent real time clustering in Architect - Select Advanced property > MEMSIZE

2. architect.cfg -

.

3. dfexec.cfg -

CLUSTER/BYTES

17The United States Postal Service (USPS) provides postal services in the United States. The USPS offers address verification and standardization tools.

CLUSTER MEMORY

18The Quality Knowledge Base (QKB) is a collection of files and configuration settings that contain all DataFlux data management algorithms. The QKB is directly editable using dfPower Studio.

Page 58: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

44 DataFlux Integration Server User's Guide

All clustering nodes have an advanced property called DELAYMEMALLOC. This setting is closely tied to the memory allocation properties for clustering. If DELAYMEMALLOC is set to true, then memory is allocated for the clustering node at the instance where the first row is about to be passed through this specific node. If it is set to false (the default value), all clustering memory will be allocated prior to the first row passing through the entire job. In the former, memory could be released and made available for later clustering node calls. Keep in mind that if the memory is not freed and you have over-allocated memory in your job or service, setting DELAYMEMALLOC to true may prevent you from discovering this until Architect is already part of the way through processing the job or service. If it is set to false and memory has been over-allocated, you will know prior to Architect running the job or service.

Note: For all sorting and clustering memory settings, you can choose to create macros to control memory settings. In this way, memory settings for different types of jobs or services can be set independently of the default macros that are used globally by all similar nodes.

fd table memory - This setting allows you to manually configure the amount of memory being allocated per table column to the Frequency Distribution Engine (FRED). By default FRED allocates 256 KB (512 KB if 64-bit) per column being profiled. This amount (the number of columns * the amount specified per column) is subtracted from the total amount configured by the user in the Job > Options

The amount of memory remaining in the available pool is used for other data operations. For performance reasons the amount of table memory should always be a power of 2. Setting this value to 1 MB (Note that 1 MB = 1024 * 1024 bytes, not 1000 * 1000 bytes) yields optimal performance. Setting it to a value larger than 1 MB (again, always a power of 2) may help slightly with processing very large data sets (dozens of millions), but might actually reduce performance for data sets with just a few million rows or fewer. If you set the amount of table memory too high you may not be able to run your job because it will not be able to initialize enough memory from the available pool.

menu of dfPower Profile Configurator (Frequency Distribution memory cache size).

Controlling Processing

accept timeout

• wait for a new connection/request (ACCEPT)

- The DIS loop, which runs continuously, is organized as follows:

• process request

• check on status of running services

• check if any services have results ready to send back

• check on status of running jobs

This setting allows the user to determine how long DIS waits before checking for new requests. The default value is 0.5 seconds. It can be lowered to as low as a microsecond. However, if this delay is lowered and DIS is sitting idle, it causes DIS to use more resources. The frequency of loop iterations increases, so DIS performs more tasks in the same period of time. If the delay is removed altogether and DIS is sitting idle, it will use 100% of available CPU processing power, because it will be continuously checking for statuses or available results. If the delay is reduced to a few hundreds or thousands of

Page 59: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 45

microseconds, when DIS is sitting idle it uses closer to 10% of the CPU power (depending on the exact delay setting). You should be aware of this increased load on the CPU due to a decrease in the delay setting before making adjustments to accept timeout

The amount of delay is configurable in the dfexec.cfg file. The format is:

. This does not apply to cases when DIS is used heavily, because there is no delay if requests are coming in frequently.

accept timeout =

# A positive value measures seconds

<time value>

# A negative value measures microseconds (10 ^ -6) # If not set it defaultS to 0.5 seconds (value of -500000) # Windows and UNIX example: accept timeout = -500000

Troubleshooting Log

log packets — This setting can be added to the dfexec.cfg file to log all SOAP Packet activity. The default is no.

Note:

Generally,

Be advised that enabling this feature can produce very lengthy log files, and slow down DIS performance.

log packets

log packets =

is enabled only for troubleshooting. The format is:

# Windows and UNIX example:

<yes or no>

log packets = yes

Note: You must restart the server after you make changes to the configuration settings. See DIS Server.

Page 60: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

46 DataFlux Integration Server User's Guide

Configuring DataFlux Integration Server to Use the Java Plugin The dfPower® Architect Java™ Plugin node is a node for Windows®, Solaris®, Linux®, and HP-UX Itanium. dfPower Studio or DataFlux® Integration Server (DIS) must be properly configured to run jobs containing the Java Plugin node. The following sections explain the configuration requirements.

Java Runtime Environment

Windows and UNIX

The primary configuration requirement is that the Java runtime environment (JRE™) must be installed on your machine. The Java Plugin currently supports the JRE version 1.4.2 or later. The actual location of the installation is not important, as long as the dfPower Architect or DIS process can read the files in the installation. The dfexec.cfg file should contain a setting called java vm

[JRE install directory]/bin/server/jvm.dll

that references the location of the Java Virtual Machine JVM™ DLL (or shared library on UNIX® variants). In the Sun™ JRE, for example, the location of this file is typically:

If this setting is not configured properly when a job using the Java Plugin runs, you will receive an error that the JVM could not be loaded. Also, your Java code must be compiled using a Java Development Kit (JDK™) of the same version or earlier than the JRE version you plan to use to run your job. For example, compiling your code using JDK 1.5 or later and running the code in the Java Plugin using JRE 1.4.2 will generate an error that the class file format is incorrect.

Java Classpath

Windows and UNIX

The location of your compiled Java code (as well as any code that it depends upon) must be specified in the classpath setting in the dfexec.cfg file. The code must also be physically accessible by the dfPower Architect or DIS process. The setting is called java classpath.

Note:

If the

On UNIX variants, you must separate the path components with a colon (:).

java classpath

If the

setting is incomplete, Architect or DIS will report an error because the code could not be loaded. Check to make sure your code and any dependencies are accessible and specified in the classpath setting.

java classpath setting is empty, the only Java code that will be accessible to the Java Plugin are the examples that ship with Architect and DIS. Refer to DataFlux dfPower Studio Online Help, "Architect - Java Plugin - Examples" for information.

Page 61: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 47

Environment Variables

UNIX

Using the Java Plugin on AIX

Before starting the server or running any jobs with dfexec, you must set the following environment variables. The LIBPATH setting assumes you are using Java 1.4 with the classic Java Virtual Machine (JVM). If you are using the J9 JVM, substitute j9vm for classic. If you are using Java 5, substitute java5_64 for java14_64.

LIBPATH

export LIBPATH=/usr/java14_64/jre/bin:/usr/java14_64/jre/bin/classic

LDR_CNTRL

export LDR_CNTRL=USERREGS

UNIX

Using the Java Plugin on HP-UX PA-RISC

Before starting the server or running any jobs with the dfexec.cfg file, you must set the following environment variable. The LD_PRELOAD example assumes you are using Java 1.4 with the Server JVM. If you are using a different JVM, set the path accordingly. In all cases, the path should be the same as the path used for the java vm setting in the dfexec.cfg file.

LD_PRELOAD

export LD_PRELOAD=/opt/java1.4/jre/lib/PA_RISC2.0W/server/libjvm.sl

UNIX

Using the Java Plugin on Solaris and HP-UX Itanium

There is no need to set environment variables on Solaris, Linux, and HP-UX Itanium to use the Java Plugin. Do note that the Java Plugin currently supports the Sun JRE version 1.4.2 or later.

Optional Settings

Windows and UNIX

There are two other settings in the dfexec.cfg file that affect the operation of the Java Plugin node. They are not required for normal use but they are available for use by developers for debugging purposes. The settings are java debug and java debug port.

Page 62: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

48 DataFlux Integration Server User's Guide

Java debug should be set to Yes or No. When set to yes, debugging in the JVM used by Architect or Integration Server is enabled. By default this setting is set to no.

Java debug port should be set to the port number where you want the JVM to listen for debugger connect requests. This can be any free port on the machine. This setting has no effect if java debug is set to no.

Note: The Java debugger cannot connect until dfPower Architect initializes the JVM in process. This happens when a Java Plugin Properties dialog is opened in Architect or when a Java Plugin node in the job is executed or previewed. If you have multiple Architect or Integration Server processes running concurrently on the same machine, only the first to respond to load the JVM secures the debugging port. All subsequent processes will not respond to the Java debugging connection requests.

Page 63: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 49

Pre-loading Services DataFlux® Integration Server (DIS) can preload selected services on startup. This is helpful if you typically use the same services each time you run DIS and would like to have these services available as soon as DIS is running.

There are two configuration directives available that cause DIS to preload services; these can be set by the DIS administrator:

• dfsvc preload all =

• dfsvc preload =

[count]

[count]:

The two formats can work independently or together, depending on how you configure them.

[name of service] [count]:[name of service] ...

Pre-loading all services The first directive, dfsvc preload all = [count], causes DIS to find and preload all services [count]

For example,

times. This includes services found in subdirectories. The number of instances of each service (count) must be an integer greater than 0, or the directive is ignored.

dfsvc preload all = 2

Pre-loading one or more specific services

causes DIS to preload two instances of each service that is available, including those found in subdirectories.

The second directive, dfsvc preload = [count]:[name of service], lets you designate the specific services, as well as the count for each service, that DIS is to preload on startup. Use additional count/service elements [count]:[name of service]

For example,

for each service, and separate each element by one or more white space characters. All elements must be listed on a single line. Using this format, you can configure a directive that starts a number of services, with each service having a different count.

dfsvc preload = 2:abc.dmc 1:subdir1\xyz.dmc

Complex configurations

loads two counts of abc service, and one count of xyz service, which is located in subdirectory subdir1.

By combining the two directives, you can configure more complex preloads. The two directives add the counts arithmetically to determine how many services are actually loaded. (Internally, DIS builds a list of all services it needs to preload and, for each service, sets the total count.)

The following two example directives illustrate the logic of how this works:

dfsvc preload all = 2

dfsvc preload = 2:svc1.dmc -1:subdir1\svc2.dmc -2:svc3.dmc

Page 64: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

50 DataFlux Integration Server User's Guide

The first directive instructs DIS to preload a total of two instances of all existing services. The second directive modifies this in the following ways:

• Two additional counts of svc1.dmc are added, for a total of four instances. The counts are added together, and the total is the number of instances that DIS tries to preload.

• Svc2.dmc, which is found in the subdir1 subdirectory, has a -1 count. This produces a total count of one for svc2.dmc.

• For svc3.dmc, there is a combined total count of zero, so this service is not loaded at all. The value of [count]

Some important points to remember:

must be greater than zero for a service to be preloaded.

• DIS attempts to preload a single instance of all requested services before trying to preload more instances (if more than one instance is specified).

• The service name can include the service's path (relative to the root of the services directory). Example: 1:subdir1\svc2.dmc specifies one instance of service svc2.dmc, which is located in the subdirectory subdir1.

• Count can be a negative value (meaningful only when both configuration directives are used together).

• Pre-loading stops when DIS has attempted to preload all required instances (successfully or not), or if the limit on the number of services has been reached. The limit can be specified by dfsvc max num =, and will default to 10 if not specified.

Page 65: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 51

Multi-threaded Operation DataFlux® Integration Server (DIS) and its components operate in a multi-threaded configuration using two servers. Both servers are part of a single DIS process, but run in independent threads on different ports and share the same thread pool. This thread pool manages the process threads. When DIS creates this pool, it determines how many total threads the pool is allowed to have, how many it should allow to stay idle, and how much time should pass after a thread becomes idle before it is killed (to conserve system resources).

The two servers are:

• SOAP server, whose main thread (started by DIS) runs a loop that accepts clients' connections and hands each one off to a thread pool (among other functions).

• Wire Level Protocol (WLP) server, which accepts connections over TCP/IP.

Two configuration directives control whether the servers run:

• svr run dis = [yes/no]

(default is yes)

svr run wlp = [yes/no]

Each request is handed off to a separate thread, so multiple requests can be processed in parallel. Requests are handled as follows:

(default is no)

• For non-real-time requests (such as running batch jobs, or listing jobs or services) the thread handling the request also sends the response for that request. At that point, the thread exits and goes back to the available thread pool.

• For real-time requests (such as to get a service's metadata or to run a service) the thread handling the request starts the process and passes any commands and data, and then exits without sending the response.

There are three additional configuration directives that determine how the thread pool operates:

• svr max threads = [# of threads]

If WLP server is to run, at least two threads are used; if SOAP server is to run, at least four threads are used; DIS automatically adjusts this value to the required minimum if the configured value is too low.

svr max idle threads = [# of threads]

Will always be at least 1. This directive should be treated as an advanced configuration, and should be used only when needed to troubleshoot performance problems.

svr idle thread timeout = [# of microseconds] Defaults to 5 seconds if not set or if set to less than 1 microsecond. This directive should be treated as an advanced configuration, and should be used only when needed to troubleshoot performance problems.

Page 66: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

52 DataFlux Integration Server User's Guide

DataFlux Integration Server Connection Manager The DataFlux® Connection Manager provides the user the functionality to store the necessary credentials once in order to easily connect to a data source in the future. This allows better protection and management for security, confidentiality, and a more versatile way to handle access to data sources that require authentication.

When DataFlux dfPower® is installed on Microsoft® Windows®, you can access the Connection Manager using the Start menu. Select Start > Programs > DataFlux Integration Server [version] > dfConnection Manager

Using Connection Manager on Windows

. In UNIX® there is a program called dfdbview, which serves a similar purpose. The purpose of Connection Manager is to save connection information with encryption so it does not need to be stored inside the job, or entered at the time the job is run.

The purpose of the Connection Manager is to save connection information with encryption so it is not necessary to enter it when the job is run, or store it inside the job. When you run Connection Manager in Microsoft Windows, you will see a list of all of the available connections with either a yes or a no next to them. When you select a connection and click Save

• HKEY_CURRENT_USER\Software\DataFlux Corporation\dac\[version]\savedconnectiondir

, you are prompted for the user name and password. If the connection is successful, this information is saved in a file in the \dac directory or the \dfdac subdirectory, depending on whether or not a registry key was found. If one or both of the following registry entries exist, the information is saved in the following directories:

• HKEY_LOCAL_MACHINE\Software\DataFlux Corporation\dac\[version]\savedconnectiondir

where [version] indicates the version of DIS that you have installed.

If the above does not exist, it is saved in the user's home directory in a subdirectory called \dfdac. This file is a plain text file containing all the information that was used to connect. The user name and password are encrypted.

Using Connection Manager on UNIX In UNIX, the Connection Manager is a program called dfdbview. When you run dfdbview -t -s [connection name], you are prompted for a user name and a password. If the connection is successful, the connection information is saved to a file in the $HOME/.dfpower/dsn directory. It is saved as a plain text file with all of the information used to connect, with the user name and password encrypted.

Page 67: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 53

Sharing Connection Information When you save a connection, you can use the saved information with any of the DataFlux applications. In Profile, if you create a job that connects to one or more data source, the application recognizes that your connection information is saved. It saves the name of the connection, and does not save it inside the job. When you run the job in UNIX, the system recognizes that a connection name is present, and looks for a saved connection in the above location.

Note:

Connection Manager User Interface

The connection names in Windows and UNIX must correspond. Use the connection names in the Architect and Profile jobs to describe your data sources. The odbc.ini file and the saved connection information on the target system determines how to connect.

DataFlux® Integration Server (DIS) Connection Manager allows you to save encrypted connection information, so that it does not need to be stored inside the job or entered at the time the job is run.

DIS Connection Manager

Select Make saved connections available to all users of this machine to allow all users on the machine to access the saved connections.

Page 68: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

54 DataFlux Integration Server User's Guide

The following table describes the options available on the DataFlux Saved Connection Manager

dialog:

Button Name Description

Click to connect to the selected data source and save the connection information. Enter the necessary user information and logon credentials. Your entry is stored and used internally the next time you log in to the desired data source.

Save

Select the desired database and click Clear Clear to delete the existing authentication credentials and set new credentials.

Click to open the Open ODBC Administrator

ODBC Data Sources Administrator dialog.

Click to open the Open Connection Administrator

DataFlux Connection Administrator dialog.

Click Help Help

to open the Help for the Connection Manager.

Page 69: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 55

DataFlux Integration Server Manager The DataFlux® Integration Server (DIS) allows users to run real-time dfPower® Architect services as well as batch Profile and Architect jobs on a server from a remote client. DIS runs on Linux®, Solaris™, HP-UX, AIX®, and Microsoft® Windows®. DataFlux provides a Windows client application called DIS Manager that can be launched from the Windows Start

Real-time Architect jobs are called Architect real-time services, while batch jobs are called Architect jobs and Profile jobs. The common term for services and jobs is objects.

menu to manage the jobs and services.

When a client application such as DIS Manager runs a service, it connects to DIS and tells it which job to run. The client stays connected until the service finishes running and receives status and result data. When a client application runs a job, it connects to DIS, tells it which job to run, then disconnects. Determining status and job termination and accessing the log files require a new connection to DIS.

Each time DIS is started, it generates a unique log file in the directory set in the dfexec.cfg file. The log file contains detailed information on what commands DIS received and the responses set, including errors. Log file names start with a time-stamp.

DataFlux Integration Server Manager User Interface The DataFlux® Integration Server (DIS) Manager is a Microsoft® Windows® client application used to manage Architect real-time services and batch Profile and Architect jobs on a server from a remote client.

Page 70: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

56 DataFlux Integration Server User's Guide

DataFlux Integration Server (DIS) Manager

This section describes the options available from the drop-down menus.

File

Change User - If DIS security is enabled, this option lets you change the user name and password for the Integration Server.

Exit

View

- This option closes DIS Manager.

Toolbar - Toggles the toolbar on and off.

Status Bar - Toggles the Status Bar on and off.

Refresh - Refreshes the window by querying the server and returning the most current information for the job status.

Page 71: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 57

Actions

Upload

Architect Jobs - Displays the Upload Architect Job dialog which allows you to choose the Architect job you want to post to the server.

Profile Jobs - Displays the Upload Profile Job dialog which allows you to choose the Profile job you want to post to the server.

Real-Time Services - Displays the Upload Real-Time Service dialog which allows you to choose the service you want to post to the server. For all of the above options, if the Open jobs and services from the Management Resources Directory option is checked under Tools > Options, you can select only jobs stored in the dfPower® Studio vault. If that option is not selected, you will see a standard dialog that lets you choose a job from any location.

Note: DIS may appear unavailable to other clients trying to connect if large files are being uploaded. This should not cause any issues with normal job files but may cause problems if a user attempts to upload any other file type that is not a DataFlux job file. If this is possible, the DIS administrator (admin) should restrict access to the post/delete command in the dfexec.cfg file.

Run Job - Submits a job for execution. The Run Job option is only available for Profile and Architect jobs; you cannot run real-time services using this option. To execute a real-time service, use the Test Real-time Service option available under the Actions menu.

Stop Job - Terminates a job that is currently running.

Test Real-time Service - Displays the Real-Time Services Testing dialog, which allows you to manually enter data to test real-time services.

Delete - Removes a job or service from the list of available jobs.

View Log - Displays the log file for the selected job ID. In order to enable this option, you must first select a job in the list of available jobs, then select a Job ID from the job status area at the bottom of the dialog.

Clear Log - Clears the log file for the selected job ID and removes the log file from the job status area.

Help Topics - Opens the Help system in a Web browser.

Page 72: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

58 DataFlux Integration Server User's Guide

Tools

Options - Select Options to display the Options dialog, which allows you to configure DIS Manager to connect to any active Integration Server. In addition to the server configuration parameters, you can also choose to load your jobs and services from the Management Resources Directory.

DIS Manager Options Dialog

Options Dialog

Server Name - The name of the machine where the server is installed (use localhost if it is running on your own machine).

Server Port - The port that was designated when the server was installed and configured.

Refresh Interval - The amount of time (in seconds) between automatic refreshes. Set this value to 0 to turn off the automatic refresh option.

Open jobs and services from the Management Resources Directory

Help

- When selected, jobs and services can be loaded only from the Management Resources Directory. When unchecked, jobs and services can be loaded from any available directory. Additional fields appear on the upload dialog enabling you to select the location of the file to be uploaded.

Help Topics - Opens the DataFlux dfPower Studio online Help. Note that you might receive a warning from Microsoft Internet Explorer® that it has blocked the help content. Click the warning bar, then click Allow Blocked Content in order to view the Help.

DataFlux Integration Server Version - Displays the DIS version you are currently running.

About DataFlux Integration Server Manager - Displays a dialog containing the version number, contact information, and copyright notices. It also displays a link that opens another dialog that allows you to check the library and database versions.

Page 73: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 59

DIS Manager Window - Other Elements

Real-Time Services - Select this tab to work with real-time services.

Architect Jobs - Select this tab to work with Architect jobs.

Profile Jobs - Select this tab to work with Profile jobs.

Item Name - The Item Name is the name of the job or service.

Status of All Jobs - Select this tab to view the status of all jobs by all users.

Status of My Jobs - Select this tab to view the status of the jobs associated with the current users. To change users, select File > Change Users.

Status of Architect Jobs - This status tab is available when the Architect Jobs tab is selected. It allows you to view the status of Architect jobs.

Status of Profile Jobs - This status tab is available when the Profile Jobs tab is selected. It allows you to view the status of Profile jobs.

Job Name - This is the Remote Name assigned to the job when it was uploaded. Click the column name to sort the jobs by name.

Request ID - A unique identifier for each run of a job. The request ID links the job to the log file. Double-click the Request ID to launch the log file viewer.

Job Owner - The user ID associated with each job. Click the column name to sort the jobs by user ID.

Status - Displays the current status of the job. Double-click Status

Toolbar

to launch the log file viewer.

The DIS Manager toolbar provides buttons to quickly access several of the commonly used main menu options.

Button Name Description

Upload Opens the upload dialog for Real-Time Services, Architect Jobs, or

Profile Jobs, according to the tab selected before clicking the button.

Download Opens the download dialog for Real-Time Services, Architect Jobs, or Profile Jobs, according to the tab selected before clicking the button.

Delete To delete jobs or services, click the appropriate tab and Item Name,

then click Delete.

Page 74: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

60 DataFlux Integration Server User's Guide

Button Name Description

Run To run a job, click the appropriate tab and Item Name, then click

Run.

Stop To stop jobs or services, click the Job Name in the Status panel, then

click Stop.

Test Service

To test a service, click the Real-Time Services tab and the Item Name, then click Test Service.

Refresh Click Refresh to refresh the screen.

View Log Click on a job or service, then click View Log to bring up the log for

that job or service in the Log Viewer.

Clear Log Click on a job or service, then click View Log to bring up the log for that job or service in the Log Viewer. The Job Name will be removed from the status panel.

Click Help Topics

Help Topics

to open the Help for dfPower Studio.

Page 75: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 61

Using DataFlux Integration Server Manager Once DataFlux® Integration Server (DIS) is installed and running, you can use DIS Manager to test the connection to the server. DIS Manager comes installed in both the DataFlux dfPower® Studio and the DIS file groups on Microsoft® Windows® machines. The DIS Manager allows users to perform the following actions:

• Upload and download batch jobs and real-time services

• Run jobs and stop jobs

• Test real-time services

• Delete jobs and services

• Monitor job status

• Use log files

Uploading Batch Jobs and Real-Time Services Once an object (a batch job or real-time service) has been created in dfPower Studio, you can upload it to the Integration Server for use by DIS.

If you checked Open jobs and services from the Management Resources Directory on the Tools > Options dialog, then all object files are uploaded to the default directory (folder) specified in the dfexec.cfg file. You can, however, create new subdirectory folders under the default directories and place files within them. This can make it easier to group objects by related tasks or other criteria.

Note:

If you unchecked

You cannot use backward or forward slashes (\ or /) in job file names, as these characters are used for designating components within path descriptions.

Open jobs and services from the Management Resources Directory on the Tools > Options

Uploading Objects to the Default Directory

dialog, then you can upload the object files to any accessible directory. This makes it easy to group jobs and services by any criteria you wish and place them in a convenient location, including on other servers.

To upload one or more objects to the default location:

1. Make sure that Open jobs and services from the Management Resources Directory on the Tools > Options

2. On the DIS Manager main window, click the tab corresponding to the type of upload you want

dialog is checked.

(Architect Jobs, Profile Jobs, or Real-time Services).

Page 76: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

62 DataFlux Integration Server User's Guide

3. Click Actions > Upload (or just click the upload button ). An upload dialog appears with the available objects listed in the Available pane on the left.

4. Select one or more objects to upload, and then click Add single (or Add multiple

if adding multiple objects at one time)

5. Click

. The selected objects are moved to the Selected pane.

OK

If you wish to prevent an object from being added, select one or more object files in the

Selected pane and click

.

Delete .

To upload one or more files to a subdirectory under the default directory:

1. Move the files to the Selected pane using the preceding procedure. Do not click OK

2. Click

.

Remote Folder in the row containing the desired file. A Browse dialog appears that lists available subdirectory folders.

3. Select the desired subdirectory (or click New

4. Click

to create a new subdirectory folder).

OK

5. Click

to close the browse dialog.

OK

Uploading Objects to a Specified Directory

to close the upload dialog and upload the files.

To upload one or more objects to your chosen location:

1. Make sure that Open jobs and services from the Management Resources Directory on the Tools > Options

2. On the DIS Manager main window, click the tab corresponding to the type of upload you want

dialog is unchecked.

(Architect Jobs, Profile Jobs, or Real-time Services)

3. Click

.

Actions > Upload (or click Upload ). An Upload dialog appears with the available objects listed in the Available pane on the left.

4. Select the objects you want to upload, and then click Add single (or Add All if adding all objects at one time)

5. An additional Directory field at the top of the dialog allows you to choose the directory where the files are to be uploaded. Manually enter the path for the directory, or click

. The selected objects are moved to the Selected pane.

Folder and navigate to the desired folder. Note that if you cannot create a new folder from this dialog; the folder must already exist.

6. In the File field at the top of the dialog, select either the default file format or All Files

7. Click

for the object type you chose in Step 2. This determines which object files appear in the Available pane on the left of the dialog.

OK to close the browse dialog.

Page 77: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 63

8. Click OK

If you wish to prevent an object from being added, select one or more object files in the

Selected pane and click

to close the upload dialog and upload the files.

Delete .

To upload one or more files to a subdirectory under the selected upload directory:

1. Move the files to the Selected pane using the preceding procedure. Do not click OK

2. Click

.

Remote Folder in the row containing the desired file. A Browse dialog appears that lists the available subdirectory folders.

3. Select the desired subdirectory (or click New

4. Click

to create a new subdirectory folder).

OK

5. Click

to close the browse dialog.

OK

Downloading Batch Jobs and Real-Time Services

on the upload dialog and upload the files.

Objects residing on the Integration Server can be downloaded to your local dfPower Studio installation using the following procedure:

1. On the DIS Manager main window, click the tab corresponding to the type of download you want (Architect Jobs, Profile Jobs, or Real-time Services)

2. Click

.

Actions > Download (or click Download ). A download dialog appears with the available objects listed in the Available pane on the left.

3. Select one or more objects to download, and then click Add (or Add All if adding all files at one time)

4. Click

. The selected objects are moved to the Selected pane.

Local Folder in the row containing the desired files. A Browse dialog appears that lists available subdirectory folders. Note that if you cannot create a new folder from this dialog; the folder must already exist.

5. Select the local folder where you want the downloaded file.

6. Click OK

7. Click

to close the Local Folder selection dialog.

OK

If you wish to delete a file or files from being added, select one or more files in the Selected

pane and click

on the download dialog and download the files.

Delete .

Page 78: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

64 DataFlux Integration Server User's Guide

Running and Stopping Jobs From the DIS Manager main menu, click the tab corresponding to the type of object you want to run (Architect jobs, Profile jobs, or Real-time Services). To run a job from DIS Manager, right-click on the job name and select Run Job. The job appears in the job status pane at the bottom of the screen. From there, you can right-click on the job name and select Stop Job. If you are running a Profile job, you must specify either File output or Repository output.

Note:

Testing Real-Time Services

The name of the output report will have a .pfo extension.

To test real-time services, right-click on the name of the service in DIS Manager. The Real-Time Service Testing dialog opens. Enter your test data here, and click Run Test

Deleting Jobs and Services

.

To delete jobs and services, right-click on the name of the job or service, and select Delete

Monitoring Job Status

.

In the bottom panel of DIS Manager, you can see the status of all jobs. Double-click on the Job Name to view the Job Log file.

Using Log Files

Job, Real-Time Service, and Server Logs

Three log files are generated from DIS.

For... Log Files Are Stored in...

Windows The \log directory of the DIS installation, by default.

UNIX®/Linux® The /etc directory, by default.

The log files are:

Job Log - When running a Profile or Architect batch job a file is created named XXXXXXXX_archjob (or profjob) _JOBNAME.log (for example, 1164914416_1164915225_90_archjob_Arch_0.dmc.log). This is the log file retrieved when you view the job status from the DIS Manager dialog.

DIS Log - Each time the server is restarted, a new log file is created named XXXXXXXX_DIS.log (for example, 1165337041_0_002F1C_DIS.log). This log tracks connections and requests to and from DIS, and stores some basic configuration settings at the beginning of the log.

Page 79: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 65

Real-Time Service Log - This log is named XXXXXXXX_archsvc_SERVICENAME.log (for example, 1165354706_1165354976_21_archsvc_SVC_address_verification_us.dmc.log). It is generated every time a service is executed if the dfsvc debug option has been set to yes in the dfexec.cfg configuration file. If the dfsvc debug

Using DAC Logging in Windows

option is not set or commented out, the files will not be created.

You can also enable additional logging, known as Data Access Component (DAC19

1. Go to the Windows Registry and click

) logging. This log provides more information when users experience problems connecting to databases.

Start > Run

2. In the

.

Open field, type regedit

3. From the Windows Registry, create one or both of the following strings:

.

• HKEY_CURRENT_USER\Software\DataFlux Corporation\dac\[version]\logfile

• HKEY_LOCAL_MACHINE\Software\DataFlux Corporation\dac\[version]\logfile

where [version] indicates the version of DIS that you have installed.

4. Set logfile to the path and filename where logging output is to be sent.

Note:

To turn off DAC logging, repeat these steps. DAC Logging can lead to large log files, and can decrease performance. Be sure to turn the log off once required information is captured.

If this entry is empty or does not exist, no logging will occur.

Using DAC Logging in UNIX/Linux

DAC logging provides more information than the job log, DIS log, and real-time service log. This information can aid you in troubleshooting.

1. Add a file named sql_log.txt (all lowercase) to the working directory the dfexec.cfg file is using to run jobs.

Note: The working directory is under the setting, working path

19A data access component (DAC) allows software to communicate with databases and manipulate data.

in the dfexec.cfg file, which is the configuration file the Integration Server reads to obtain its settings. The dfexec.cfg file is located in the .$DFEXEC_HOME/etc directory. In most cases (utilizing the default paths) the directory where this file should be placed is $DFEXEC_HOME/var/dis_job_io.

Page 80: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

66 DataFlux Integration Server User's Guide

2. After adding the file, stop and restart the server.

Important:

dfexec Return Codes

DAC Logging can lead to very large log files, and can decrease performance on the server. Be sure to turn off DAC logging once the required information is captured.

When dfexec is called from an external program, such as a scheduler, it produces return codes to indicate its status. The return codes and their meanings are:

Return Code Description

0 Job is still running

1 Job has finished successfully

2 Job has finished with errors: Unspecified internal error

3 Job has finished with errors: Invalid command-line parameters

4 Job has finished with errors: Invalid configuration

5 Job has finished with errors: Failed during job execution

6 Job has finished with errors: Licensing error

7 Job has finished with errors: Invalid or unsupported locale

8 Job has crashed

9 Job was terminated

Page 81: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 67

Command Line Options DataFlux® dfPower® Studio Architect jobs and Profile jobs can be run from the command line. Running uploaded jobs from the command line allows users to call these jobs from their own scheduling software, or write scripts that call these jobs.

Use the following command line options as needed:

Windows

To run jobs from the command line on Microsoft® Windows® computers, use the following string. Note that the input macros are optional:

set <macro1>=<value1> && set <macro2>=<value2> && <dfexec path>\dfexec [options] <job path and name>

To run jobs without using the optional input macros, use:

<dfexec path>\dfexec [options] <job path and name>

UNIX

To run jobs from the command line on UNIX® systems, use the following string. Note that the input macros are optional:

<macro1>=<value1> <macro2>=<value2> ./<dfexec path>/dfexec [options] <job path and name>

To run jobs without using the optional input macros, use:

./<dfexec path>/dfexec [options] <job path and name>

Windows and UNIX Options

The following options for dfexec are used for both Windows and UNIX/LINUX, unless otherwise noted:

Options Description

interactive mode -i

-q quiet (no status messages)

-cfg use alternate configuration file FILE

-env FILE use file for environment variable (UNIX ONLY)

-log use FILE for logging output FILE

--version display version information

--help display option information

Page 82: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

68 DataFlux Integration Server User's Guide

Additional options for Architect jobs:

Options Description

-w write default target's output to terminal

-fs SEP use SEP as field separator for terminal output

-m MODE execution mode: d(efault); s(erial); p(arallel)

Additional options for Profile jobs:

Options Description

-o output file or repository name; file names must end with .pfo OUTPUT

-n report name NAME

-a append to existing report

-desc optional job description DESC

Page 83: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 69

DIS Security Manager Concepts Windows and UNIX

Users

When DataFlux® Integration Server (DIS) security is enabled, the user must be authenticated using a user name and password. When jobs are run with DIS security enabled, note the following:

• From the perspective of the operating system, the process owner is the account under which DIS runs.

• From the perspective of DIS, the process owner is the user who started the job.

In addition, when DIS security is enabled, DIS makes available the name of the user executing the batch job or real-time service. For any requests received by DIS, the server logs the name of the user who sent the request to a DIS log file. If the request is to run a batch job, DIS sets the environment variable DFINTL_DIS_USER for that batch job, along with the name of the user who is executing the job. If the request is to execute a real-time service, DIS sets the macro value DFINTL_DIS_USER for that service, along with the name of the user who is executing the service.

By default, DIS security is disabled. While DIS security is disabled, there is no user authentication process and all jobs show the process owner as Administrator

The DIS security administrator (admin) can create users and assign users to groups. All user accounts must be added to the users file located in the security path specified in the dfexec.cfg file. A user name is case sensitive, can be up to 20 characters and can only include alphanumeric characters as well as these symbols: . , - , or _. There are no restrictions on which characters or words can be used in passwords. A password may be set to blank. Passwords do not expire.

. When a request is received by DIS, the macro value or environment variable DFINTL_DIS_USER will show administrators as the owner.

Groups

DIS security has two special group accounts: administrators and everyone. The everyone group includes all users, present and future. If you create an account called everyone, DIS will log an error and ignore that account. The administrators

The system does not require groups. However, for easier administration of DIS, the administrator can create groups, assign users to groups, and assign groups to other groups. All group accounts must be added to the groups file in the security path specified in the dfexec.cfg file. A group name can be up to 20 characters and is case sensitive.

group has access to all commands and objects regardless of explicitly set permissions.

Page 84: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

70 DataFlux Integration Server User's Guide

Command Permissions

DIS security supports per user

command permissions. These are initially set when a new user is created, and may be changed at any time. Changing user permissions does not require a server restart. Command permissions are defined by setting Boolean flags to enable (1) or disable (0) permissions for a given command. Permissions may be set for the following commands:

Bit Position

Command Description

1 When this option is selected, the user can view Architect real-time service parameters and execute Architect real-time services.

Execute Real-Time Service

2 When enabled, allows the user to execute, terminate, get status, get log, and delete log for Architect jobs.

Execute Architect Job

3 When set to enabled, allows the user to execute, terminate, get status, get log, and delete log for Profile jobs.

Execute Profile Job

4 When enabled, the user can post Architect real-time service files.

Post Real-Time Service

5 If this option is enabled, the user can post an Architect job file.

Post Architect Job

6 If set to enabled, allows the user to post a Profile job file. Post Profile Job

7 When enabled, allows the user to delete an Architect real-time service file.

Delete Real-Time Service

8 When enabled, the user can delete an Architect job file. Delete Architect Job

9 If this option is enabled, the user can delete a Profile job file. Delete Profile Job

10 If enabled, allows the user to see a list of Architect real-time services.

List Real-Time Services

11 When set to enabled, the user can view a list of Architect jobs.

List Architect Jobs

12 When enabled, allows the user to see a list of Profile jobs. List Profile Jobs

13 If this option is enabled, the user can get the status for all Architect and Profile jobs.

List All Statuses

Page 85: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 71

The following are examples of command strings:

Command

String Privileges

1111110001111 This user has privileges for all commands except deleting jobs and services.

0000000001111 This user can post jobs and services as well as view job status. 0110110110111 This user can perform all actions on Architect and Profile jobs but cannot

post, execute, delete, or list real-time services.

Here is an example of an ACL with security permissions set for users and groups:

ACL Security Permissions

Access Control Lists

Access control lists (ACL20

If the ACL contains an unrecognized owner, DIS assumes the administrators group is the owner. For unrecognized and duplicate users and groups, DIS ignores the access control entry (

s) are used to secure access to individual DIS objects. If an object is copied to DIS instead of posted through the client, DIS automatically creates an ACL file when the object is first accessed. The owner is automatically set to the administrators group. A DIS administrator can manually create or edit an ACL for any object. Changes to ACLs do not require a server restart.

ACE21). If there are no valid ACEs, DIS uses the default setting for the everyone

When an object is deleted using DIS, the ACL file is deleted. If an object is deleted manually and an object by the same name is later posted using DIS, a new default ACL file is created. If an old ACL file exists, it will be overwritten.

group (as defined in the dfexec.cfg file).

Object Ownership

When a user posts an object using DIS, that user is automatically set as the owner of the job. When a user creates an object by copying the file, ownership is automatically set to the administrators group. The admin can change ownership to another user or group at any time in the ACL file.

20Access control lists (ACLs) are used to secure access to individual DIS objects. 21An access control entry (ACE) is an item in an access control list used to administer object and user privileges such as read, write, and execute.

Page 86: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

72 DataFlux Integration Server User's Guide

The owner of an object will always be able to execute and delete an object, regardless of user or group permissions.

DIS security supports a configuration setting to prevent automatic ownership assignment for posted objects. This is the enable ownership setting in the dfexec.cfg file. If automatic ownership assignment is disabled, any previous ownership entries in ACL files are ignored and all objects are owned by the administrators group.

User, Group, and Command Permission Interactions

When user and group level permissions differ, user level permissions take precedence. For example, if an object has an ACL with an ACE denying a user access to the object and another allowing a group (where the user is a member) access to the object, the user is denied access to the object.

When group level permissions differ, the most restrictive user permission takes precedence. For example, if a user is a member of groups A and B and an object's ACL has an ACE allowing access to group A but denying access to group B, that user is denied access to the object.

In order to deny object-level access to all users but a few, an ACL should be set to a deny everyone (0) ACE and then an allow (1) ACE for individual users or groups.

Command-level permissions are defined for user accounts only, while groups can only be used in ACLs. When individual user command permissions differ from permissions granted in an ACL, the most restrictive permission usually (except when everyone is set to allow permissions) takes precedence. For example, if a user has command-level permission to execute Architect jobs but a particular job has an ACL denying access to this user, the user cannot access this particular job.

Security Administration The admin is responsible for setting up users, groups, passwords, and permissions on commands and, optionally, objects. You can also change ownership of existing objects, configure default object permissions, or turn off DIS security.

Note:

Setting Up Integration Server Security

The admin must make sure the default DIS security directory (etc\dis_security) is set to allow DIS to read, write, delete, and create files in the security directory. In addition, set the read and modify permissions for other users to deny.

Follow these steps to enable security for your Integration Server installation:

1. Set the security options in the dfexec.cfg configuration file.

2. Add users to the users file.

3. Optional: Add user groups.

Page 87: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 73

The admin can either edit the required files manually (recommended), or use the supplied administration command line utility. The command line utility, dsm_cmdln.exe, can be found in the bin directory of the DIS installation. This utility uses a menu-driven approach using MS-DOS® or UNIX® command lines. Most administrators will find editing the user and group files is more efficient than using the command line utility.

Once you modify the security settings in the dfexec.cfg configuration file and add user accounts to the users file, you will need to restart the server. At this point the new security settings will take effect and users will be prompted for logon credentials when using DIS Manager.

Configuration Options in the dfexec.cfg File

After planning your DIS security hierarchy, you are ready to set up the dfexec.cfg file. Below is a list of security-related options that you must set in the dfexec.cfg file:

• enable security — (yes/no) if set to yes, the DIS security is enabled. User authentication is required to connect to the server and to perform actions. If set to no

, the security is disabled.

enable ownership — (yes/no) if set to yes, a user is assigned as the owner of an object they post to the server. If set to no

, ownership defaults to the administrators group. Object ownership allows for implicit rights to execute or delete an object that takes precedence over explicitly configured permissions.

allow everyone — (yes/no) if set to yes

, all users (present and future) will have access to all objects by default. This group is used to specify an object's permissions (allow or deny) that apply to all users.

security path

In order for any changes made to the dfexec.cfg file to be implemented, you must

— (path to the security sub-directory) this is the path where all security-related files are stored. These files include the users and groups. If no path is specified the server looks in etc/dis_security.

restart the server.

Creating Users

The first step in adding users is to create the users file. This file needs to be created in the directory specified by the security directory setting in your dfexec.cfg file. The file should be named users and have no file extensions. You can add users to this file with any text editor. For more information on user file layout refer to Security Files.

Creating User Passwords

User passwords are not required, however they are recommended. To generate an encrypted password for a user on Microsoft® Windows® platforms, you must run the hashpassword utility or the HashPasswordStrong utility provided in the bin directory of the Integration Server installation. Hashpassword creates a password hash, encrypting the user's password. HashPasswordStrong adds the following requirements. Passwords must contain at least: (1) six characters, (2) one numeric digit, (3) one uppercase letter, and (4) one lowercase letter. Once the encrypted password is generated, you can copy it from the utility into the users file for the given user.

Page 88: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

74 DataFlux Integration Server User's Guide

HashPasswordStrong.exe

To generate an encrypted password for a user on UNIX platforms, refer to Security Commands for UNIX. For more information on user file layout refer to the Security Files section below.

Creating Groups

Adding user groups to your security model is optional. In order to create user groups you must have a file named groups in the directory specified by the security directory settings in your dfexec.cfg file. This file does not have a file extension. For more information regarding the layout and contents of a groups file, refer to the Security Files section.

Security Files

DIS security files can reside in any directory accessible to DIS. The full path of the DIS security directory can be configured using the security path setting in the dfexec.cfg file. If such a path is not specified, DIS will look for security files in etc/dis_security. If DIS Security is enabled and DIS cannot find a users file or load any users from the file, DIS writes an error to the log but continues initializing.

The security files include:

users

admin:d033e22ae348aeb5660fc2140aec35850c4da997:1111111111111

: This file contains user names, hashed passwords, and DIS command permissions. Here is an example of the security permissions for users:

user1:b3daa77b4c04a9551b8781d03191fe098f325e67:1110000001111 user2:a1881c06eec96db9901c7bbfe41c42a3f08e9cb4:0010010010011 user3:0b7f849446d3383546d15a480966084442cd2193:1001001001000 user4:0000000001110

The preceding is a sample user's file. There should be one entry per line. The basic file layout is as follows:

[username]:[password]:[permissions]

Page 89: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 75

In the above example:

User Name

Permissions

admin The admin is granted all permissions.

user1 This user is authorized to execute all objects, and retrieve a list of all objects and their statuses, but cannot post or delete objects.

user2 This user is authorized to list, post, execute, and delete Profile jobs but not Architect jobs or real-time services.

user3 This user is authorized to list, post, execute, and delete real-time services but not Architect or Profile jobs.

user4 This user is authorized to list all objects on the server but not to perform any other actions. This user also does not require a password.

groups

# sample groups file

: This file contains group names and user names. Here is an example of the security permissions for groups:

administrators:admin group1:user1:user2 group2:user4:admin group3:group1:user3 group4:user4

Above is a sample groups file. A group can contain one or more users and one or more groups. There should be one entry per line. The basic file layout is as follows:

[groupname]:[group or user]:[group or user]:[group or user], etc.

In this example:

Group Name Permissions

administrators This group contains one user: admin.

group1 This group includes user1 and user2.

group2 This group includes user4 and admin.

group3 This group includes all members of group1 and user3.

group4 This group includes only user4.

Page 90: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

76 DataFlux Integration Server User's Guide

ACLs

[objectname]_[type].acl:

: This file contains the owner information and permissions for a job or service and is named as follows:

The values for [type] are: archjob - Architect batch job archsvc - Architect real-time service profjob - Profile job

Here is an example of the security permissions set in the ACL file:

user1 everyone:1 user3:0

Above is a sample ACL file. ACLs should have one entry per line. The basic file layout is as follows:

[owner][group or user]:[allow (1) or deny (0)]

In this example:

User

Name Permissions

user1 This is the owner of the object. The first line of the file always denotes the object owner.

everyone Everyone in the users file has permission to execute or delete the object.

user3 This user is explicitly denied permission to execute or delete the object. This explicit permission overrides any user or group permission settings.

Once you have finished setting up security, you must restart the DIS service in Windows.

Security Commands for UNIX Use the following disadmin commands to manage users and groups if security is enabled. Arguments listed in brackets are optional. If an optional argument is not provided, disadmin prompts the user for the value.

Command Description and Example

moduser [USERID [PASSWORD [BITS]]]

Modify information for an existing user. Example:

adduser [USERID [PASSWORD [BITS]]]

./bin/disadmin moduser fred secret_password 1110000001111

Add a new user. Example:

./bin/disadmin adduser claudio

Page 91: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 77

Command Description and Example

deluser [USERID] Delete a user. Example:

passwd [USERID [PASSWORD]]

./bin/disadmin deluser claudio

Set the password for a user. Example:

chperm [USERID [BITS]]

./bin/disadmin passwd fred

Change the permissions of a user. Example:

modgroup [GROUPID [MEMBER [MEMBER...]]]

./bin/disadmin chperm fred 1111110001111

Modify information for an existing group. Example:

addgroup [GROUPID [MEMBER [MEMBER...]]]

./bin/disadmin modgroup development fred claudio

Add a new group. Example:

delgroup [GROUPID]

./bin/disadmin addgroup QA

Delete an existing group. Example:

./bin/disadmin delgroup QA

Using Strong Passwords in UNIX The strong passwords setting in the dfexec.cfg configuration file is used by the disadmin application in UNIX to enforce the following rules for passwords:

• minimum length of six characters

• require at least one number

• require at least one uppercase letter

• require at least one lowercase letter

This setting affects the following disadmin commands: adduser, moduser, and passwd.

You must restart the DIS daemon in UNIX if you have made changes to the dfexec.cfg configuration file. You do not need to restart the daemon if you have made changes only to the users or groups files.

For an overview of the four types of security available in DIS, see DIS Security Tools.

Page 92: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

78 DataFlux Integration Server User's Guide

Security Policy Planning A well-planned security model allows the DataFlux® Integration Server (DIS) security administrator (admin) to control access to the application. DIS offers several security tools, allowing the administrator to work with your existing security policy. As a resource on your network, DIS usage can be defined based on your security model, which in turn is based on usage policy, risk assessment, and response. Determining user and group usage policies prior to implementation helps you minimize risk and expedite deployment.

Risk Assessment - Security policies are inevitably a compromise between risk and necessary access. Users must access the application and data in order to perform necessary tasks, but there is associated risk when working with information, particularly confidential data. Consider the risks of compromised (unauthorized views or lost) data. The greater the business, legal, financial, or personal safety ramifications of compromised data, the greater the risk.

Usage Policy - Determine usage policy based on risk assessment. Take into account individual and group roles within your organization. What policies are already in place? Do these users or groups already have access to the data used by DIS? Are they dfPower® Studio users? Generally, users will fall into one of the following categories: administrators, power or privileged users, general users, partners, and guests or external users. The approach to deny all, allow as needed will help you to implement security from the top down. New users should have restricted access. Access for administrators and power users could then be conferred manually or through explicit group permissions.

Security Response

For more information, see

- Consider establishing a security response policy. If you have a security response team, specify how they are to respond to and report violations of security policy. Consider training all users on acceptable use prior to deployment of DIS.

DIS Security Examples.

Page 93: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 79

DIS Security Tools DataFlux® Integration Server (DIS) offers security options that can be used alone or in combination. Using settings in the dfexec.cfg file, the DIS security administrator (admin) can restrict access based on IP address. The admin has the ability to control access based on user, group, or job with the DIS Security Manager or by manually editing security files.

DIS Security

The DIS security subsystem gives administrators the ability, in a very granular way, to limit the way various users can access or execute Architect jobs and services. DIS Security Manager enables the admin to secure DIS commands and

Manager

objects22

For more information, see

on a per-user basis, by explicitly creating user accounts and setting user, group, and job level access. Control can be administered by named user, by group, or can be explicitly assigned to jobs and services themselves. The ability for users or groups to get job lists, post new jobs, delete existing jobs, and query for job status can all be controlled with this subsystem.

Using Security Manager.

In addition to restricting access by IP address and setting up access rights for individual users and groups, DIS can be integrated with Lightweight Directory Access Protocol (LDAP). The password that allows DIS to bind with the LDAP server must be set in the DIS configuration file in an encrypted format. An encryption utility is included for that purpose, and is described in

DIS Security with LDAP

DIS with LDAP Integration. LDAP users who do not exist on DIS are automatically added with default command permissions on first access. The DIS administrator can change these permissions. When security is enabled, each user must authenticate through DIS with a user name and password. DIS then passes user credentials to the LDAP server to be authenticated. After authentication, DIS authorizes the user request based on permissions set for the command or resource. The admin also has the option of disabling DIS Security, in which case no authentication is required and no authorization will be performed.

IP-Based Security

The admin can control access by IP address with configuration settings in the dfexec.cfg file. The configuration setting restrict general access will default to allow all, which is suitable for administrators. The configuration setting restrict get_all_stats access allows control over who can view the status of all jobs, as opposed to viewing the status of one specific job. Generally, access should be limited to administrators. The restrict post/delete access setting allows control over who can post and delete jobs. For more information on IP-Based Security, refer to Configuration Settings.

Remote Administration of Security

Remote administration functionality is available to the admin through new SOAP requests, in order to administer DIS users and groups.

22 services and jobs

Page 94: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

80 DataFlux Integration Server User's Guide

DIS now supports SSL for SOAP clients. You can use secure encryption anytime a server's address is entered as

SSL

https:// instead of http://

soap over ssl = yes

. Due to U.S. export restrictions related to encryption methods, SSL support is shipped as a separate package that is installed following successful installation of DIS. Servers will need to be configured for SSL and customers are expected to establish and maintain their own SSL environment and have it in place prior to using it with DIS. SOAP server configuration directives are:

A key file is required. If the key file is not password protected, the second configuration directive of this pair can be commented out.

soap ssl key file = 'C:\Desktop\Key File\' soap ssl key passwd = 'encrypted password'

The following directives are used if a Certificate Authority certificate file or path to a directory with trusted certificates is needed. If they are not needed, comment them out.

soap ssl CA cert file = 'C:\Desktop\Certificate Authority Folder\CAfile' soap ssl CA cert path = 'C:\Desktop\Certificate File\'

Best Practice: Refer to Appendix A: Best Practices - Plan your security model based on your organization's business needs.

Page 95: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 81

Using Security Manager Adding Users

The administrator creates users and can assign users to groups. All user accounts must be added to the users file in the security path specified in the dfexec.cfg file. A user name is case sensitive, can be up to 20 characters, and can only include alphanumeric and these symbols: ., -, or _.

DIS Security Manager Add New User Dialog

Complete these steps to add a new user to your system:

1. To add a new user, click Edit > Add. The New User Properties

2. Under the

dialog opens.

General tab, type the User Name

3. Type the

.

Password for the new user.

Note:

4. Type the password in the

There are no character or word restrictions on passwords. A password may be set to blank. Passwords do not expire.

Verify PW

5. Select the permissions the user will have based on the information in the

field.

Command Permissions section.

6. When you are finished selecting the permissions, click OK. The new user is added to the list.

Page 96: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

82 DataFlux Integration Server User's Guide

Now you can create additional users, add the user to a group, or close the DIS Security Manager.

You can also add users directly to the users file in one of the following formats:

[username]:[hashedPassword]:[permissions] [username]::[permissions]

For more on hashed passwords and user permissions, see DIS Security Manager Concepts.

Adding Groups

DIS security has two special group accounts, administrators and everyone. The everyone group includes all users, present and future. If you create an account called everyone, DIS will log an error and ignore that account. The administrators

The system does not require groups. However, for easier administration of DIS, the admin can create groups, assign users to groups, and assign groups to other groups. All group accounts must be added to the groups file in the security path specified in the dfexec.cfg file. A group name can be up to 20 characters and is case sensitive.

group has access to all commands and objects regardless of explicitly set permissions.

DIS Security Manager Add New Group Dialog

1. To create a new group, click the Groups

2. Click

tab.

Edit > Add. The New Group Properties

3. Type the

dialog opens.

Group Name under General

4. Click the

.

Users tab.

Page 97: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 83

5. Click Add to add users to the new group. The Add Users dialog opens.

Add Users to Group Dialog

6. You can select one or more users for the new group. To select more than one, click the first user name then press CTRL

7. Click

+ click on the other user names.

OK. Your users are listed on the New Group Properties dialog.

New Group Properties Dialog

8. Click the Sub-Groups

9. You can add any of the groups listed to the new group. To add an existing group, click

tab.

Add . The Add Groups dialog opens.

Page 98: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

84 DataFlux Integration Server User's Guide

Add Groups Dialog

10. To add one group, click the group name.

11. Click OK

12. If you need to create more groups, continue or click

.

OK to close the New Group Properties

13. The new group appears under the

dialog.

Groups

Adding a User to a Group

tab.

To add a user to a group, complete the following steps:

1. To add a user to a group, click the user name.

2. Click Edit > Item Properties. The User Properties dialog opens.

Page 99: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 85

User Properties Dialog

3. Click the Groups

4. Click

tab.

Add . The Add Groups

5. Select one or more groups.

dialog opens with a list of existing groups.

6. To select more than one group, click the first group then press CTRL

7. Click

and select additional groups.

OK. The group you select now appears under Groups on the User Properties

8. Click

dialog.

OK

Adding a User to the Administrators Group

.

The steps for adding a user to the Administrators group is similar to adding to a group.

1. To add a user to a group, click the user name.

2. Click Edit > Item Properties. The User Properties

3. Click the

dialog opens.

Groups

4. Click

tab.

Add. The Add Groups

5. Select

dialog opens with a list of existing groups.

administrators

6. Click

.

OK. The administrators group appears under Groups on the User Properties dialog.

Page 100: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

86 DataFlux Integration Server User's Guide

7. Click OK

Deleting Users or Groups

.

You can delete one or more users and groups.

1. Select the users or groups you want to delete.

2. Click Delete . A warning message appears.

Delete Users Warning Message

3. If you want to delete the selected users or groups, click OK

Viewing ACL Properties

.

To view the properties for an ACL23

1. Locate the ACL under the appropriate tab.

, complete the following steps:

2. Right-click on the file.

3. Select Properties, the ACL Settings

23Access control lists (ACLs) are used to secure access to individual DIS objects.

dialog opens. Notice the Owner, the status for the Everyone group, and the ACEs. You can make changes to the ACL Properties, add or delete ACEs, or toggle ACE settings.

Page 101: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 87

ACL Settings Dialog

4. Click OK to close the ACL Settings dialog.

Page 102: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

88 DataFlux Integration Server User's Guide

Security Manager User Interface DataFlux® Integration Server (DIS) Security Manager enables the DIS security administrator (admin) to secure DIS commands and objects24 on a per-user basis, by explicitly creating user accounts and setting user, group, and job level access.

DataFlux Integration Server (DIS) Security Manager

File

Click File from the main menu for these options:

Open Configuration File

To open the dfexec.cfg configuration file, click File > Open Configuration File, the Open

24 services and jobs

dialog appears.

Page 103: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 89

DIS Security Manager Open Configuration File Dialog

Open Security Directory

To open a security directory, click File > Open Security Directory, the Select Directories dialog opens.

DIS Security Manager Open Security Directory Dialog

Save

Click File > Save

Exit

to save the security settings in the configuration file.

Click File > Exit to close DIS Security Manager.

Page 104: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

90 DataFlux Integration Server User's Guide

Edit

Click Edit in the main menu for these options:

Click

Add

Add to add new users, groups, or Access Control List (ACL25s) settings.

DIS Security Manager Add New User Dialog

Delete

Select the user, group, service, or job you want to delete. Click Edit > Delete

Select All

to delete the selected items.

Click Select All to select all items under that tab.

Item Properties

Select a user or group you want to view. Click Edit > Item Properties to view and edit the properties.

Select more than one user from the list of Users. Click

Multiple User Permissions

Edit > Multiple User Permissions

25Access control lists (ACLs) are used to secure access to individual DIS objects.

to make the same changes to the selected user names.

Page 105: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 91

To make changes to multiple ACLs, select the ACLs then click

Multiple Access Control List Properties

Edit > Multiple ACL Properties. Here, you can add or remove users and groups from more than one ACL.

Click

Preferences

Edit > Preferences to set the options for saving files in DIS Security Manager.

DIS Security Manager Preferences Dialog

Users/Groups Save Preferences

Backup Users File

Select this option to create a backup of the users security file.

Page 106: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

92 DataFlux Integration Server User's Guide

Location of Users Security File Backups

If this option is selected, when click Save, the current users file is named users

Backup Groups File

. Previous backup users files have a stamp following the file name.

Select this option to create a backup of the groups security file. The current groups file will be named groups

Overwrite groups file if changed

, and backup groups files will have a stamp following the file name.

Select this option to overwrite the groups file each time group security is changed.

ACL Save Preferences

Backup ACL Files

Select this option to create a backup of ACL files.

ACL Backup File

Warnings

Warn on save if group empty

If this option is selected, you will receive a warning message when a group is empty.

Page 107: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 93

Empty Group Warning Dialog

If you do not want to see this message in the future, you can select Do not show this dialog again

View

.

From the main menu, click View

Toolbar

to change the way DIS Security Manager appears:

Select to view the toolbar under the main menu. The toolbar appears by default.

Status Bar

Select Status Bar

Gridlines

to show the status bar and view the status of DIS Security Manager.

Select Gridlines

Help

to see the horizontal and vertical lines in DIS Security Manager.

Help Topics

Click Help > Help Topics

About Security Manager

, to access DIS online Help.

To view the version of DIS you are running and DataFlux contact information, click Help > About Security Manager

Toolbar

.

The DIS Security Manager toolbar provides buttons to quickly access several of the commonly used main menu options.

Page 108: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

94 DataFlux Integration Server User's Guide

Button Name Description

Open Configuration File

Click Open Configuration File for the Open dialog.

Open Security Directory

Click Open Security File for the Select Directories dialog.

Save Click Save to save the files that have been changed in DIS Security Manager.

Add To add users or groups, click the appropriate tab and then click Add.

Delete To remove users or groups, click the user or group you want to delete, and then click Delete.

Note: To delete multiple users or groups, select the users or groups (click the name + CTRL) then click Delete.

Properties Click Item Properties to view details for the item you are viewing. For example, the User Properties dialog opens when you are on the Users tab, the Group Properties dialog opens when you are on the Groups tab, and the ACL Settings dialog opens when you are on the Services or Jobs tabs.

Multiple User Permissions

The Multiple User Permissions option is used to make changes to multiple users at one time. Select the users then click Multiple User Permissions. The Permissions - Multiple Users dialog opens. Select the Command Permissions the users should have.

Multiple ACL Permissions

Multiple ACL Permissions allows you to make user and group permission changes to multiple ACLs. Select the ACLs you want to change then click Multiple ACL Permissions; the ACL Settings

dialog opens.

Page 109: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 95

IP-based Security Through settings in the dfexec.cfg file, the DataFlux® Integration Server (DIS) security administrator (admin) can control access by IP address. These settings, combined with DIS Security Manager settings, control user access.

You can restrict access to DIS by specifying IP addresses of clients that are allowed or denied access. There are two supported restriction groups, general access and access to post and delete commands.

When configuring each restriction group you must specify either "allow" or "deny" (but not both). This directive can be followed by lists of specific IP addresses and ranges. You can also use "all" or "none" keywords, but in this case any explicitly defined IP addresses or ranges are ignored. An IP address that is denied general access is implicitly denied access to post and delete commands.

Configuration for each restriction group must be entered on a single line using the 'space' character as a separator between entries. IP ranges must be specified using '-' character with no spaces.

Setting Description and Example

restrict general access = (allow/deny)

Use this setting to restrict access to the server by IP address. If this is not set, the default is to "allow all". For example:

restrict general access = allow 127.0.0.1 192.168.1.1-192.168.1.255

Another example:

restrict general access = allow 127.0.0.1 192.168.1.190

restrict get_all_stats access = (allow/deny)

When the statuses of all jobs are requested, the client receives all job IDs. If this is not set, the default is to "allow all". For example:

restrict get_all_stats access = deny all

Note:

restrict post/delete access = (allow/deny)

Only administrators should be allowed to request status of all jobs.

This option restricts access to the server for posting and deleting jobs. If this option is not set, the default is to "allow all". For example:

restrict post/delete access = 127.0.0.1

Page 110: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

96 DataFlux Integration Server User's Guide

DIS with LDAP Integration DataFlux® Integration Server (DIS) can use the Lightweight Directory Access Protocol (LDAP) to authenticate users on all platforms if desired. When using LDAP, no DIS users need to be defined or managed on DIS. DIS can create users automatically by authenticating each unique logon using LDAP and setting default permissions for that user. When using Microsoft® Windows®, the client is based on the Microsoft Active Directory® and Windows SSL support is based on Windows Crypt library. When using UNIX®, the client is based on openLDAP, with SSL support based on openSSL. The client also supports communication with LDAP servers in clear text.

By default, LDAP is disabled. LDAP options can be configured in the dfexec.cfg file. To enable LDAP, add enable ldap = yes

This section applies to both Windows and UNIX/Linux®, except where noted:

to the dfexec.cfg file.

• For UNIX/Linux, refer to LDAP Requirements for UNIX/Linux Platforms

• For Windows, refer to the LDAP Domain section under LDAP Directives

Note:

LDAP Requirements for UNIX/Linux Platforms

For information about LDAP server setup and configuration refer to your LDAP server documentation.

AIX® - Requires the ldap.client.rte package to be installed. Run lslpp -l ldap.client.rte to check for a previous installation. This package is located on the AIX installation media.

HP-UX® - Requires the LDAP-UX client to be installed. Run /usr/sbin/swlist -l product LdapUxClient to check for a previous installation. If it is not installed, download it by going to the Hewlett Packard® Web site at http://software.hp.com/portal/swdepot/displayProductInfo.do?productNumber=J4269AA.

Linux - Requires the OpenLDAP client to be installed. On an RPM-based system such as RedHat® or SuSe™, run pm -q openldap to check for a previous installation. For other Linux systems, consult the system documentation to test the availability of software packages. RedHat Enterprise Linux 4 or later requires the compat-openldap package. Run rpm -q compat-openldap to check for a previous installation. This package can be found on the installation media or RHN.

Solaris®

Operation

- No additional requirements are needed. The LDAP client library is part of the Solaris core libraries.

Note: The LDAP implementation in DIS does not currently support LDAP groups membership resolution, however group command permissions are supported. If groups are needed, DIS administrators can define them on DIS in terms of either LDAP, DIS users, or both.

Page 111: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 97

When security is enabled and LDAP is being used, DIS then authenticates user credentials with the LDAP server. Once a user is authenticated, DIS authorizes the user's request based on configured permissions for the requested command or resource.

When a request comes from a user DIS does not recognize, credentials are passed to the LDAP server for authentication. In the case of success, DIS appends a new LDAP user account to the users file and sets the command permissions to the configured default value.

When a request comes from an LDAP user DIS recognizes, the user's credentials are passed to the LDAP server for authentication. If the user already exists in the users file with a password set to x, this indicates an LDAP user opposed to a DIS local user. Credentials are then authenticated, and the user's existing command permissions, which are set in the users file, are used.

In order to authenticate LDAP users, DIS must bind with the LDAP server using an encrypted password. The encrypted password is then entered into the dfexec.cfg file using the directive:

ldap bind pwd =

For Windows, an encryption utility located in C:\Program Files\DataFlux\DIS\[version]\bin, EncryptPassword.exe, is available to generate an encrypted password. To generate an encrypted password:

[encrypted password]

1. Launch the application EncryptPassword.exe.

2. Enter and confirm the password.

3. Generate the encrypted password.

4. Copy and paste the encrypted password into the directive, ldap bind pwd = [new encrypted password]

For UNIX, a command is available to encrypt passwords:

.

disadmin crypt

After running the command, the user is prompted to enter and confirm the password.

Permissions

When an LDAP user accesses DIS for the first time, DIS automatically creates an account for the user in the users file and sets the default commands permissions. The DIS administrator can change the user permission in the users file.

The configuration file permissions are set in the dfexe.cfg file. In this configuration file, permissions are automatically set for new user entries. The default value is specified as:

default commands permissions = [permissions bits]

For example,

default commands permissions = 1111111111111

In the preceding example, the default grants all permissions using all 1s. Configure the permissions as desired for your installation.

Page 112: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

98 DataFlux Integration Server User's Guide

When an LDAP user already exists on DIS, the command permissions configured for that user in the users file are applied. If the user is also a member of one or more DIS groups, the group's permissions are a factor in the access level for the user.

The DIS administrator can allow some LDAP users to have access to DIS, which restricting others. To do so, they must set all default command permissions to deny, which is 0. Then, they have to add the necessary LDAP users to the DIS users file and set the appropriate command permissions. If the password field is set to x

Alternatively, if the DIS administrator wants to deny just a few LDAP users access to DIS, the administrator can:

, the user is an LDAP user.

1. Add the users to a DIS group with all command permissions set to deny, or

2. Add the users to the DIS users file with all their command permissions set to deny, while setting less restrictive default command permissions for the desired users.

If LDAP users do not need to be restricted from accessing DIS, no special steps are necessary.

DIS does not maintain a history of connections and must authenticate every request received. To reduce the amount of LDAP queries, a configuration in the dfexec.cfg file allows the administrator to specify how long DIS may cache the user authentication information. If set to 0, DIS communicates with LDAP for every request. Otherwise, when a user first attempts to log on, DIS calls LDAP to authenticate the user, caches the user name and password, stores it in the form of a SHA-1 hash, and finally caches the LDAP authentication result for a configured amount of time. If the user sends more requests during that period, DIS does not go to LDAP to re-authenticate that user. This is a useful option in environments where changes to LDAP users are infrequent. The administrator also has the option of disabling DIS Security completely. No authentication would be required and no authorization would be performed.

If DIS is deployed in an environment where it needs to authenticate LDAP users from multiple LDAP servers or Active Directory domains, it is the responsibility of LDAP/Active Directory administrators to ensure there are no duplicate user accounts between LDAP servers or Active Directory domains.

To configure DIS for Active Directory, set the following directives in the dfexec.cfg file, then restart DIS:

# Enable LDAP/AD for authentication enable ldap = yes # Define LDAP/AD server and port ldap host = yourhost:portnumber123 # Define AD domain ldap domain = yourdomain

Note: This is the minimal configuration required for Active Directory installations and may work for most cases.

Page 113: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 99

To configure DIS for SSL, use the following directives in the dfexec.cfg file:

# If set to 0 (default), the client communicates with servers in clear text. # If set to 1, all communication is over SSL. This setting is optional: ldap use ssl = 1 # Used only if SSL is enabled # Do not use with a self signed certificate, will not recognize): ldap ignore svr cert = 0 # Used to set host address and port: ldap host = XXX.XX.XXX.XXX:636

Configuration File DataFlux Integration Server with LDAP and Active Directory sample implementation:

Setting Description and Example

enable ldap = (yes/no) Enables LDAP and Active Directory in DIS.

enable ldap = yes

ldap base dn = 'CN=Users,DC=[domain name],DC=COM'

This setting is based on implementation of the LDAP schema. For example:

ldap base dn = 'CN=Users,DC=domainname,DC=COM'

ldap bind dn = 'CN=Domain User,CN=Users,DC=[domainname],DC=COM'

This setting represents the bind for an individual user, based on your implementation of the LDAP Schema. Here, substitute what appears in the schema for Domain User. For example:

ldap bind dn = 'CN=Domain User,CN=Users,DC=domainname,DC=COM'

ldap bind pwd = 'password' Enter your encrypted password for the preceding user. For example:

ldap bind pwd = 'a1BCDEF/UVwxYZ=='

ldap cache timeout = [min] Optional setting to place a time limit on DIS to cache LDAP authentication in minutes.

ldap cache timeout = 30

ldap debug file = [path] Specifies a location for the debug log. For example:

ldap debug file = C:\Program Files\DataFlux\DIS\[version]\log\ LDAP_Debug.txt

Page 114: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

100 DataFlux Integration Server User's Guide

Setting Description and Example

ldap domain = [domain name] Required for Active Directory authentication.

ldap domain = domainname

ldap host = IP address:[default port value] LDAP or Active Directory host IP address or name and port number. For example:

ldap host = 127.0.0.1:389

Note:

ldap ignore svr cert = 1

Port 389 is the typical Active Directory port assignment.

If a client machine is not configured to recognize the specific certificate authority behind the SSL certificate on the server, or the server does not have an officially issued SSL certificate, this option can be set to 1 so the client will not reject the server and SSL communication may continue. SSL must be enabled for the setting to be active. The default value is 1. This setting is optional.

ldap ignore svr cert = 1

ldap search attribute = [commonname] This setting is based on your LDAP schema. For example:

ldap search attribute = CN

ldap use ssl = 1 If set to 0, which is the default, the client communicates with servers using clear text. If set to 1, all communications are over SSL. This setting is optional.

ldap use ssl = 1

Page 115: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 101

LDAP Directives Following is a list of supported configuration file directives for the LDAP client:

LDAP base dn - The distinguished name (DN) of the level/object of the LDAP directory tree from which to start searching for a user account. An example of a value is ou=People,dc=dataflux,dc=com. This value must be set if the client is to authenticate users against LDAP servers. If Microsoft® Active Directory® is used, this value is ignored.

LDAP bind dn - Some LDAP servers may not allow anonymous binds, in which case this value must be configured. It defines the DN of a user who is allowed to bind to the LDAP server in order to do searches. An example of a value is uid=RAM,ou=People,dc=dataflux,dc=com. This setting is optional.

LDAP bind pwd - Password for the user who is allowed to bind to the LDAP server to do searches. If password is not set, the bind operation is treated as anonymous (the LDAP bind dn setting is ignored). Note that an anonymous bind in LDAP implies no authentication of the user is done.

LDAP cache timeout - The LDAP client can cache successfully authenticated users in memory. This setting specifies the number of seconds to keep a cached user account in memory. When a user/password are given to the client for authentication, LDAP first tries to find that user in the cache. If found, LDAP checks if the cached account has expired and if the cached and given passwords match. If either of the checks fail, the account is removed from the cache and the received credentials are passed to the LDAP server for authentication. The default value is 0, which means no caching is done and every authentication request is always passed to the LDAP server. This setting is optional.

LDAP debug file - File name where the LDAP client will log its configuration and user authentication activities, including any errors. The path must be valid. The file is opened (or created, if needed) in append mode. It is your responsibility to delete the debug file. This setting is optional.

LDAP domain - Default domain name for authenticating users. Setting this value indicates to the LDAP client that it will be communicating to an Active Directory. If this value is not set, the client assumes it is talking to a regular LDAP server. When users from this domain enter their credentials, they do not need to fully qualify account names. Users from other domains are also supported, but are required to enter fully-qualified account names, such as user@domain or domain\user (the style depends on how you have Active Directory configured). This setting applies to and must be set for only the Active Directory environment. Otherwise, do not set this option.

LDAP host - LDAP server used to authenticate users. The format is [hostname]:port. An IP address can be used instead of the hostname. Multiple servers can be specified, in which case each host:port entry must be separated by a space. Entries are attempted in the same order as configured. If one server cannot be contacted, the next one in the list is contacted. This configuration option is required.

Page 116: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

102 DataFlux Integration Server User's Guide

LDAP ignore svr cert - A client machine may not be configured to recognize the specific certificate authority behind the SSL certificate on the server, or the server may not have an officially issued SSL certificate. In these cases, this option can be set to 1 so the client machine does not reject the server, whose certificate it may not recognize, and SSL communication may continue. SSL must be enabled for the setting to be active. The default value is 1. This setting is optional.

LDAP search attribute - An attribute in the LDAP schema to search for when looking for a user entry. The default value is uid, which is used most commonly. However, an organization might have users logging in using email addresses (for example), instead of user account names. This configuration parameter allows this kind of flexibility. This setting is optional.

LDAP search scope - The scope of the search of the LDAP server. A value of zero means base search (to search the object itself). A value of one means one level search (to search the object's immediate children). A value of two, which is the default, means subtree search (to search the object and all its descendants). This setting is optional.

LDAP use ssl - If set to 0 (default), the client communicates with servers using clear text. If set to 1, all communications are over SSL. This setting is optional.

Note: For information about LDAP server setup and configuration refer to your LDAP server documentation.

Page 117: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 103

DIS Security Examples There are two types of security available with DataFlux® Integration Server (DIS), IP-based security, and DIS Security Manager. IP-based security, configured in the dfexec.cfg file, controls user access by IP address. The DIS Security Manager application is part of the Integration Server. Through the Security Manager interface, user access can be controlled based on user, group, and job level permissions. These security tools can be used separately or together. Following are some scenarios employing different types of security:

Scenario 1: Users in a small, local group use a specific range of IP addresses.

Scenario: Users have static IP addresses or draw dynamic addresses from a known range. If the group is small, or licenses are restricted to only a few machines, this may be the highest level of security needed by your organization.

Security plan:

Scenario 2: Your organization requires control over user and group level access.

You can restrict access to DIS by specifying IP addresses of clients that are allowed or denied access. Access can be restricted by general access, post/delete access, and restrictions on requests for statuses of jobs.

Scenario: Different users or groups require different levels of access, or certain files may require different permissions.

Security plan:

Client request:

The DIS security subsystem provides this degree of control. User name and password are passed using basic HTTP authentication to DIS. Information on that user's user permissions, group permissions, and file permissions are kept in DIS security files. The DIS security subsystem can be used alone or with IP-based security. The following is an example of basic HTTP authentication:

GET /private/index.html HTTP/1.0 Host: localhost

Server response:

HTTP/1.0 401 UNAUTHORIZED Server: HTTPd/1.0 Date: Sat, 27 Nov 2004 10:18:15 GMT WWW-Authenticate: Basic realm="Secure Area" Content-Type: text/html Content-Length: 311 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> <HTML> <HEAD> <TITLE>Error</TITLE> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-

Page 118: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

104 DataFlux Integration Server User's Guide

1"> </HEAD> <BODY><H1>401 Unauthorised.</H1></BODY> </HTML>

Client request:

GET /private/index.html HTTP/1.0 Host: localhost Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

Server response:

HTTP/1.0 200 OK Server: HTTPd/1.0 Date: Sat, 27 Nov 2004 10:19:07 GMT Content-Type: text/html Content-Length: 10476

Scenario 3: User authentication through LDAP adds an additional layer of security.

Scenario: Your organization uses LDAP for user authentication.

Security plan:

Scenario 4: The DIS Security Administrator wants to remotely administer a large number of users.

In this case, user name and password are still passed to DIS through basic HTTP authorization, but DIS passes the information on to LDAP to authenticate the user. If the user is not authenticated, the LDAP server returns an error.

Scenario: The administrator wants to perform administrative tasks from the command line.

Security plan: DIS security remote administration consists of SOAP commands to administer DIS users and groups. This remote functionality allows the administrator to: change passwords; list all users; list all groups; list user's groups; list group's members; add user; set user's permission; add group; delete account; add account to group; and delete account from group. DIS must be running and security enabled. Remote administration can be used with or without LDAP. Note that error messages may change if LDAP is integrated with DIS.

Page 119: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 105

Frequently Asked Questions General What is an Integration Server?

An Integration Server is a service-oriented architecture (SOA26

By processing these jobs in Windows or UNIX, where the data resides, you can avoid network bottlenecks and can take advantage of performance features available with higher-performance computers.

) application server that allows you to execute Architect or Profile jobs created using the DataFlux® dfPower® Studio design environment on a server-based platform. This could be Microsoft® Windows®, Linux®, or nearly any other UNIX® option.

In addition, existing batch jobs may be converted to real-time services that can be invoked by any application that is Web service enabled (for example: SAP®, Siebel®, Tibco®, Oracle®, and more). This provides users with the ability to reuse the business logic developed when building batch jobs for data migration or loading a data warehouse, and apply it at the point of data entry to ensure consistent, accurate, and reliable data across the enterprise.

What is the difference between DataFlux Standard Integration Server and DataFlux Enterprise Integration Server?

The DataFlux Standard Integration Server supports the ability to run batch dfPower Studio jobs in a client/server environment, as well as the ability to call discrete DataFlux data quality algorithms from numerous native programmatic interfaces (including C, COM, Java™, Perl, and more). The Standard Integration Server allows any dfPower Studio client to offload batch dfPower Profile and Architect jobs into more powerful server environments. This capability frees up the user's local desktop, while enabling higher performance processing on larger, more scalable servers.

The DataFlux Integration Server (DIS) Enterprise edition has added capability allowing you to call business services designed in the dfPower Studio client environment or to invoke batch jobs using Service-Oriented Architecture (SOA).

How can I run multiple versions of DIS in Windows or UNIX?

The following procedure shows how to run multiple versions of DIS on a UNIX machine. This procedure can also be applied to versions of DIS older than 8.0.

26Service Oriented Architecture (SOA) enables systems to communicate with the master customer reference database to request or update information.

Page 120: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

106 DataFlux Integration Server User's Guide

UNIX

Multiple versions or instances of the same version of DIS can be installed on a UNIX server. Different versions or instances must be installed in separate directories. Instead of creating a single directory for the software, for example, /opt/dataflux, each version or instance must have a separate directory, for example, /opt/dataflux/serv1disv8.0, /opt/dataflux/serv2disv8.1, /opt/dataflux/serv2disv8.1a

WINDOWS

. The installer must run in each directory and a different port must be designated for each server, otherwise the installations can be configured identically.

Multiple versions of DIS services can be installed on Windows systems with some modifications. When installing any version of a DataFlux Windows service, the settings of the currently installed version are overwritten, because the same names are used, preventing multiple versions from existing concurrently. The solution is to rename older versions of the services, so that installing or reinstalling the latest version does not affect the older versions.

Note:

The following procedure shows how to run both the 8.0 and 8.1 versions of the DIS service on a Windows machine. This procedure can also be applied to versions of DIS older than 8.0.

Once an older version of a service is renamed, reinstalling that version or applying an update to the version will require some user intervention, such as stopping and starting the service manually, and editing the registry so the current version of the service is pointing to the correct executable file.

Make sure the older (8.0) DIS service is the active service

1. To open the Services management console, click Start > Settings >

2. Double-click

Control Panel.

Administrative Tools >Services

3. Double-click the

.

DataFlux Integration Server service entry to display the Properties

4. Notice the Path to executable property. If that property uses the 8.0 bin directory, then skip to

dialog.

Rename the older (8.0) Batch Scheduler service

5. Stop the DIS service by clicking

, otherwise continue with this procedure. The remainder of this procedure assumes the Path to executable property uses the 8.1 bin directory. If not, substitute the appropriate directory name.

Stop, or select Action > Stop from

6.

the menu.

Close the Properties

7. Exit the

page.

Services

8. Open a command prompt window and change the directory to the DIS 8.1 bin directory.

management console.

9. Type dfintgsvr.exe -u and press Enter

10. Change to the DIS 8.0 bin directory.

. A message will appear, confirming the removal of the service.

Page 121: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 107

11. Type dfintgsvr.exe -i and press Enter

12. Close the command prompt window.

. A message will appear, confirming the installation of the service.

Rename the older (8.0) Batch Scheduler service

1. To open the Services management console, click Start > Settings >

2. Double-click

Control Panel.

Administrative Tools > Services

3. If the DataFlux Integration Server service,

.

DFIntegrationServiceservice, is started, right-click on it and select Stop. Services can also be started and stopped by opening a command prompt window and typing NET START [service name] or NET STOP [service name]

4. Exit the

.

Services

5. Open the registry editor by clicking

management console.

Start >Run, typing regedit, and pressing Enter

6. Right-click on HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DFIntegrationService and select

.

Rename

7. Change the key name from DFIntegrationService to

.

DFIntegrationService80

8. Double-click the DisplayName property, listed on the right.

.

9. Change the name from DataFlux Integration Server to DataFlux Integration Server 8.0

10. Close the registry editor.

.

11. Reboot the computer.

Note:

Reinstall the latest (8.1) Batch Scheduler service

Windows XP and Windows Vista® have a Services Control Manager (SCM) that manages all services running on the system. The SCM cannot be easily reset or restarted while Windows is running, making a system reboot necessary.

1. Open a command prompt window and change the directory to the DIS bin directory.

2. Type dfintgsvr.exe -i and press Enter

3. Close the command prompt window.

. A message confirming installation of the service will appear.

4. Open the Services

5. Right-click on the DataFlux Integration Server service and select

management console.

Start. Both DIS services, DataFlux Integration Server and DataFlux Integration Server 8.0, should now be in the Services

6. Exit the

management console with a status of Started.

Services management console.

Page 122: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

108 DataFlux Integration Server User's Guide

With DIS Manager installed as part of the base dfPower Studio, connect to the desired server and select the job or service to be uploaded. You can also use DIS Manager to test real-time services on your server.

How do I move an Architect or Profile job to UNIX so it can be processed by an Integration Server?

How do I save Profile reports to a repository using DIS?

In order for DIS to store Profile reports in a repository, the profrepos.cfg configuration file must exist in the \etc directory of the DIS installation. The profrepos.cfg file contains the list of available repositories and specifies one of them to be the default. The format for the profrepos.cfg file is:

$defaultrepos='Repos1 Name' Repos1 Name='ODBC DSN' 'Table Prefix' Repos2 Name='ODBC DSN' 'Table Prefix'

where:

Repos3 Name='ODBC DSN' 'Table Prefix'

$defaultrepos: Indicates the default repository. Repos1 Name: User-defined name for the repository. ODBC DSN: The Data Source Name defined for the ODBC connection. Table Prefix: Prefix that was given for the repository when it was created.

In the following example there are three repositories configured for Profile reports: TEST, dfPower Sample Prod, and dfPower Sample Cust. The two dfPower Sample examples are stored in the same database but use table prefixes (Prod_ and Cust_) to create a unique set of tables for each repository. The default repository is dfPower Sample Prod.

$defaultrepos='dfPower Sample Prod' TEST='TEST DATABASE' '' dfPower Sample Prod='DataFlux Sample' 'Prod_'

Profile repositories on any supported platform can be managed by using the Profile Repository Administrator that can be accessed from the

dfPower Sample Cust='DataFlux Sample' 'Cust_'

Start

For more information on creating and maintaining Profile repositories see the

menu.

dfPower Studio Online Help topic, "dfPower Profile - Profiling Repositories."

Page 123: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 109

How do you enable/disable DAC logging?

To enable/disable DAC27 logging in a Windows environment:

• HKEY_CURRENT_USER\Software\Dataflux Corporation\dac\[version]\logfile

From the Windows Registry, create one or both of the following string values:

• HKEY_LOCAL_MACHINE\Software\DataFlux Corporation\dac\[version]\logfile

where [version] indicates the version of DIS that you have installed.

Set logfile to the filename where logging output is sent. If this entry is empty or does not exist, no logging occurs. This is also how to turn off logging.

Note:

To enable/disable DAC logging in a UNIX/Linux environment: Add a file named sql_log.txt (all lowercase) to the var/dis_job/io directory. The dfexec.cfg file is located in the dfpower /etc directory.

Make sure you turn off logging once required information is captured.

To enable/disable DAC logging from the command line: When running dfexec from the command line using [your_jobname.dmc] as input, the DAC log will be created inside the current working directory of the dfexec executable. You must create the file sql_log.txt inside that working directory to enable DAC logging.

What SOAP commands are recognized by DIS?

For a complete list of SOAP28 commands recognized by DIS, refer to the Soap Commands topic.

How do I add an additional driver for the data sources?

DIS is compatible with most ODBC compliant data sources. DataFlux recommends using supplied ODBC drivers instead of client-specific drivers provided by the manufacturer. Limited support will be available for implementation and problem resolution when a client implements a driver not supplied by DataFlux.

For a complete list of supported drivers, see Supported Databases.

I can't see my saved job even though it's saved in a directory I have access to. Where is it?

In Windows, a job that will be run through DIS must be saved in a location that does not use mapped drives. Win32 service is not able to access mapped drives, even if the service is started under a user account that has those drives mapped.

27A data access component (DAC) allows software to communicate with databases and manipulate data. 28Simple Object Access Protocol (SOAP) is a Web service protocol used to encode requests and responses to be sent over a network. This XML-based protocol is platform independent and can be used with a variety of internet protocols.

Page 124: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

110 DataFlux Integration Server User's Guide

Are there restrictions on which characters are allowed in job names?

Job names can include alphanumeric characters only. If any of the following characters are included in a job name, DIS will not list that job name and will not allow any operations on that file: , , . , ' , [ , ] , { , } , ( , ) , + , = , _ , - , ^ , % , $ , @ , or !.

How can I be automatically notified of new releases of DataFlux products?

To arrange to receive update notification, visit the DataFlux Customer Care Portal at: http://www.dataflux.com/Customer-Care/index.asp. From there you can select User Profile

How do I know which log file is mine?

. Then select to receive both data update notification and the DataFlux newsletter.

All DIS log file names start with the date that corresponds to when the current instance of DIS was started. The beginning of the file name will be in the following format: YYYYMMDD-HH.MM_. The date portion of the file name is followed by "00_" for the purpose of sorting the directory so that all log files related to a particular instance of DIS are grouped together. The DIS log itself will either be the first or last log file. After that there are some random characters the operating system creates to guarantee unique file names. The name ends with "_DIS.log."

Service and job log files begin with: MMDD-HH.MM.SS_ representing the date when the request to load this job or service was received by DIS. This is followed by a request number (since multiple requests can be received in the same second), followed by the job or service name.

For example:

20070215-12.13_00_0210B0_DIS.log 20070215-12.13_0215-12.13.44_4_archsvc_ram.dmc.log 20070215-12.13_0215-12.14.04_10_archjob_100Kclust.dmc.log

Can I run a UNIX shell command from an Integration Server Architect job?

Yes, the execute() function allows you to run a program or file command from the shell. For example, the following code allows you to modify the default permissions of a text file created by Architect.

To execute the command directly, type:

execute("/bin/chmod", "777", "file.txt")

or to execute from the UNIX/Linux shell, type:

execute("/bin/sh", "-c", "chmod 777 file.txt")

Why is my job failing to read SAS Data Sets on AIX?

In order to access SAS Data Sets on AIX, you must have AIX 5.3 with patch level 6 installed on your system.

Page 125: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 111

Installation What is the default temp directory, and how do I change it?

DIS uses the system temporary directory which is determined by the TMP environment variable in Windows and the TMPDIR environment variable in UNIX.

In UNIX/Linux, you can redirect temporary files, by doing the following:

1. Export TMPDIR=[your new temp path].

2. Edit the .profile file to include this new path.

3. Restart the DIS daemon.

In Windows, the default setting for TEMP is C:\Windows\Temp. You may set the value of TEMP Environment Variable for that user or set the TEMP System variable to a different location. To change environment variables in Windows, do the following:

1. Right-click My Computer, and select Properties

2. Click the

.

Advanced

3. Click

tab.

Environment variables

4. Click one of the following options, for either a user or a system variable:

.

• Click New

• Click an existing variable, and then click

to add a new variable name and value.

Edit

• Click an existing variable, and then click

to change its name or value.

Delete

5. Restart the DIS service.

to remove it.

How do I connect to a database?

DIS connects to databases through ODBC. To add a data source, use the ODBC Data Source Administrator provided with Windows, or use the dfdbconf command in UNIX. In Windows, click Start > Programs > DataFlux dfPower Studio [version] > DataDirect Connect ODBC Help. You may require assistance from your network administrator to install this ODBC Connection, as it requires site-specific information. For more information, see Configuring a Data Source.

Security What are the requirements for user names?

A user name is case sensitive, can be up to 20 characters, and can only include alphanumeric characters and these symbols: . , - , or _.

Page 126: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

112 DataFlux Integration Server User's Guide

Are there any restrictions for passwords?

There are no restrictions on the characters or words that can be used for passwords. A password may be set to blank. Passwords do not expire. This is also true for passwords for which you create password hashes using HashPassword.exe. If you are using HashPasswordStrong.exe

We do not need additional security, is it required?

to generate password hashes, your password will need to contain a minimum of: six characters, one number, one uppercase letter, and one lowercase letter.

No, you are not required to use the DIS security subsystem. It is available to fit your business needs. Security is disabled by default.

Is there a limit to the number of users and groups I can add using DIS security?

There is no limit when using the DIS security command line option to add users and groups however, if you are using DIS Security Manager to create users and groups, there is a limit of 10,000 users and 10,000 groups.

Can we use OpenSSL to restrict the users in one or more domains?

Enabling LDAP access over SSL only encrypts LDAP network traffic, including all DIS LDAP traffic to all configured servers. SSL has no impact on the number of LDAP servers or domains a DIS instance can use.

How can I configure DIS to use one or more LDAP servers?

You can configure DIS to use any number of LDAP servers by entering them on the same line in dfexec.cfg file, separated by a white-space. The LDAP servers will be queried in the order in which they are specified in the dfexec.cfg file.

How do I restrict the LDAP servers that are used by DIS?

If you want to restrict DIS to use one or more specific LDAP servers, configure those servers in dfexec.cfg file.

Page 127: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 113

Troubleshooting I saved a job in dfPower® Studio, but now I don't see it.

Check the job name. Job names can include alphanumeric characters only. If any of the following characters are included in a job name, DataFlux® Integration Server (DIS) will not list that job name and will not allow any operations on that file: , , . , ' , [ , ] , { , } , ( , ) , + , = , _ , - , ^ , % , $ , @ , or !.

The DIS service failed to start.

When attempting to start DIS from the Microsoft® Windows® services dialog, the following error appears:

DIS Windows Service Error

Alternatively, if you try to start DataFlux Integration Server Manager without starting the Windows service, you will get the following error message:

DIS Timeout Error

Check to see if DIS created a log file. If so, you would expect to see the following two lines in the log:

Syntax error at line 34. Initialization failed.

If not, then the error is likely to be in the dfexec.cfg file itself. Look at the Windows application log for an error message containing the reason the DIS service failed to start. Some of the settings should be enclosed by single quotes. In this example, the license dir setting in the dfexec.cfg file was changed so that it was no longer in the proper format.

Page 128: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

114 DataFlux Integration Server User's Guide

The service is running, but I'm still getting the SOAP-ENV:Client: Timeout - connect failed in tcp_connect() error.

Check the bottom edge of the Integration Server Manager main screen for the server and port.

Verify that you are able to connect to the server listed. Make sure the Port value matches the value of server listen port in the dfexec.cfg file. The default value is 21036. If these values do not match, you can change server listen port in dfexec.cfg, or change server port under Tools > Options

I can connect to the machine running the license server, but I cannot get a license.

from the Integration Server Manager main menu.

Typically, no license server port number needs to be explicitly specified. The license server automatically selects an available port between 27000 and 27009. If a client machine can connect to the license server, but the license client process cannot connect to the license server process, an explicit port number must be specified.

In the license file on the server, a port number can be explicitly specified as the last item in the SERVER line as shown below. The port number can be added or changed in the existing license file without the need to regenerate it. The license server has to be restarted once a port number is added or changed. For example:

SERVER [servername].com 001125c43cba 27000

On the client side, a port number can be explicitly specified by prepending it to the server name, for example:

27000@[servername].com

I get one or both of the following error messages:

[time of error] (DATAFLUX) UNSUPPORTED: "DFLLDIAG2" (PORT_AT_HOST_PLUS ) phamj4@UTIL0H4GHXD1 (License server system does not support this feature. (-18,327))

[time of error]

These error messages refer to the licenses for two built-in features that are used internally by DataFlux for debugging. These licenses are not distributed, so the license checkout request process for these two licenses fails and produces the errors noted. This is normal and should occur only once, when the license checkout request is made for the first time.

(DATAFLUX) UNSUPPORTED: "DFLLDIAG1" (PORT_AT_HOST_PLUS ) phamj4@UTIL0H4GHXD1 (License server system does not support this feature. (-18,327))

Page 129: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 115

When I try opening a job log from DIS Manager, I get the following error:

Error occurred while attempting to retrieve job log: SOAP-ENV:Client:UNKNOWN error [or Timeout]

This occurs on some configurations of Microsoft Windows Server® 2003 when the log file is greater than 32KB. A workaround for this problem is to set the following configuration value in the dfexec.cfg file. This should only be necessary for DIS running on Windows Server 2003, and only if you experience this problem.

server send log chunk size = 32KB

Page 130: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

116 DataFlux Integration Server User's Guide

Error Messages Installation and Configuration

No Valid Base License Found

If you do not have a valid license file on your machine, you will get an error when attempting to run a job with nodes that require that license. Following is an example of an error you might expect in this case:

dtengine :: warning :: No valid base license found dtengine :: error :: Job contains node with unknown id 'SAMPLEFIELDNAME'. The node may not be licensed, or the plugin for the node may be unavailable. dfexec :: error :: unable to load job dfexec :: error :: aborted due to errors

See instructions for obtaining a license file for Windows or UNIX.

Data Source Connection Errors

When configuring a new data source, it is critical that parameters (such as DSN29

Common World Address Verification Error Codes

, host, port, and sid) match exactly those used to create the job on the client machine. If the connection fails, DataFlux® Integration Server (DIS) displays error messages describing the reasons for the failure.

Error Code Description

-100 Time for testing is expired.

156 Too many results after validation. Only the first 20 results will be presented.

157 No certification for this country.

204 Country not recognized.

205 Country database not found.

206 Country database in the wrong format or data corrupt.

207 Country database access denied. License may be missing.

300 No country rule could be loaded.

-1004 Country is locked.

-9999 Call encountered an error. The reason for the error is unknown.

29A data source name (DSN) contains connection information, such as user name and password, to connect through a database through an ODBC driver.

Page 131: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 117

Security

401 Unauthorized

If the user is not authenticated, there will be an HTTP error, 401 Unauthorized. This could mean that you have entered invalid user name and password credentials, or your user account has not been set up. Contact your DIS Security Administrator for assistance.

403 Forbidden

When a user receives the HTTP error, 403 Forbidden, you have entered the system but do not have permissions to execute a particular DIS command. Contact your administrator for assistance.

Logging

An owner of an object may view the log and status for that object, but users must have permissions to view other user logs.

Job with Custom Scheme Fails to Run

A job with a custom scheme that fails to run will produce an error similar to the following:

dtengine :: error :: Blue Fusion load scheme 'CUSTOM_SCHEME_1.sch' failed: Blue Fusion error -801: file i/o error dfexec :: error :: unable to initialise step: Standardization 2 dfexec :: error :: aborted due to errors

You must ensure that: (1) the Quality Knowledge Base (QKB) you are using on the Integration Server is an exact copy of the QKB used on dfPower, and (2) the name of the scheme is typed correctly, as it is case sensitive. To copy the QKB from Microsoft® Windows® to UNIX®, use FTP or Samba mappings. You must restart the DIS service, and retry the job. On some UNIX systems, there is a case sensitivity issue with the schemes. Once you copy the QKB over to the UNIX server, make sure that the name of the scheme is modified to all lowercase. It is located in the qkb directory, under /scheme.

Active X Control Required to View Help Files

In Internet Explorer® 6.0 and later, your network administrator can block ActiveX® controls from being downloaded. Security for ActiveX content from CDs and local files can be changed under Internet Options.

In Internet Explorer, click Tools > Internet Options. On the Advanced tab, under Security, select Allow active content from CDs to run on My Computer, and Allow active content to run in files on My Computer

Locale Not Licensed

.

If your job has a locale selected that you do not have listed in your license, you will get an error message similar to the following:

Error message DT engine: 2::ERROR::-105:Local English [US] not licensed

Page 132: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

118 DataFlux Integration Server User's Guide

You must contact DataFlux Customer Support to update your license with the new locale. Also verify that the data file for that locale is located in the /locale folder of your dfPower installation.

Node Not Licensed

An error message similar to the following can occur when the user has more than one copy of the license file:

Failed to create step: Couldn't instantiate step 'SOURCE_ODBC'. It is not an available step. The node may not be licensed, or the plugin for the node may be unavailable.

Check the \license directory of the dfPower Studio installation. Remove any extra copies of studio.lic file from that folder.

Running Jobs and Real-Time Services

Error When Connecting to Excel Database

Because Microsoft Excel® is not a true database, you may occasionally experience problems making the ODBC connection. This will produce an error similar to the following:

Failure while connecting to data source (Excel files) : [HY000][Microsoft][ODBC Excel Driver] Cannot open database '(unknown)'. It may not be a database that your application recognizes, or the file may be corrupt. (=1028)

Naming the spreadsheet can fix this issue. To eliminate the error, highlight all of the cells in the spreadsheet. Select Insert > Name

Error While Connecting to Source Repository

and enter the name of the spreadsheet.

This repository error takes the following form:

Error occurred while connecting to source repository: Repository does not exist, is corrupted, or not unified

There are two possible causes for this error. First, you may need to update your unified repository. Or, when attempting to connect to a repository on Sybase® or Oracle®, you may see this error message. You may see this error because the driver reports the incorrect column size. To implement the option, add the string ColumnsAsChar to the data source in the registry and set the value to 1. On UNIX machines, you should modify the data source in the odbc.ini file, adding the line ColumnsAsChar=1

For example:

.

On Microsoft Windows, HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI, edit the data source in question and add the following string value and give it a value of 1:

ColumnSizeAsCharacter

Page 133: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 119

On UNIX, edit the ODBC.ini file, find the entry for the data source in question and add this line:

ColumnsAsChar=1

Functionality Disabled Error When Creating a CASS PS Form 3553

Why did I get a "functionality disabled" error message when creating a CASS PS Form 3553?

The PS Form 3553 for CASS certification can be created only on Microsoft® Windows®, Solaris™, and AIX™ systems.

Low Disk Space on UNIX

The following error indicates that temporary disk space on the machine running DIS is low:

dfexec :: error :: unable to read row 0 dtengine :: error :: File I/O operation failed. Possibly out of disk space. dfexec :: error :: unable to execute step: Data Joining 1 dfexec :: error :: aborted due to errors Data source 1 contains 583,166 rows Data source 2 contains 10,125,806 rows

You may be able to free up resources by increasing the efficiency of the job. Check the conditions of the Join and see if you are running a many-to-many Join. Also, check to see if there is a high number of NULL values in the two tables you are joining, which can also cause a huge join. You also need to check the memory limitations on the UNIX system where the DIS process is running to ensure that there is enough room for DIS to work properly.

By default, DIS uses /tmp directory for temporary files. You need to redirect the TMPDIR environment variable by issuing the export TMPDIR=/[newTempPath] command. You can check this environment variable and change the settings with the following steps:

1. Stop the DIS service.

2. From the UNIX shell, type: env | grep TMPDIR. If results come back blank that means you do not have your TMPDIR redirected as an environment variable.

3. Type: export TMPDIR=/[newTempPath].

4. Start the DIS service.

5. Edit the .profile file and add export TMPDIR=/[newTempPath] to .profile for the user who starts DIS.

Errors Using the Java Plugin for UNIX

The architect.cfg file contains a setting called java vm that references the location of the Java™ Virtual Machine (JVM) DLL. If this setting is not configured properly, you will receive an error that the JVM could not be loaded.

Page 134: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

120 DataFlux Integration Server User's Guide

Make sure you compile your JAVA code using a Java Development Kit of the same version or earlier than the JRE version you are running on your DIS machine.

If the java classpath setting is incomplete, Architect or DIS will report an error because the code could not be loaded. Check to make sure your code and any dependencies are accessible and specified in the classpath setting.

If the java classpath setting is empty, the only Java code that will be accessible to the Java Plugin are the examples that ship with Architect and DIS. Refer to DataFlux dfPower Online Help, "Architect - Java Plugin - Examples" for information.

Page 135: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 121

Appendix A: Best Practices Use a System Data Source Rather Than a User Data Source

Add the new data source as a System data source name (DSN30

Use Connection Manager to Configure Data Sources

), rather than a User DSN, so it will be available to all users of the machine, including Microsoft® Windows NT® services.

When developing DataFlux® dfPower® Architect jobs or services, use the Connection Manager to set up and store connection information to any Open Database Connectivity (ODBC31

Use global variables within Architect jobs and services to accept or retrieve data. Using global variables increases the flexibility and portability of dfPower Architect jobs and services between data sources.

) data source.

In this way, the connection information to the master reference database is made permanent and independent of the Architect job or service. The saved information does not have to be entered each time the job is run, and that information can be used by any DataFlux application.

Windows

• HKEY_CURRENT_USER\Software\DataFlux Corporation\dac\[version]\savedconnectiondir

- This connection information is stored in the directory specified in one or both of the following registry entries:

• HKEY_LOCAL_MACHINE\Software\DataFlux Corporation\dac\[version]\savedconnectiondir

where [version] indicates the version of DIS that you have installed.

If neither of these entries exist, connection strings are saved in the \dac folder on the job's run-time machine.

UNIX

Plan Your Security Model Based on Business Needs

- The connection information is saved to a file in the /$HOME/.dfpower/dsn directory.

The DataFlux Integration Server (DIS) application is a network resource that is used to access and modify your data. A well-planned security model is based on usage policy, risk assessment, and response. Determining user and group usage policies prior to implementation helps you to minimize risk, maximize utilization of the technology, and expedite deployment.

30A data source name (DSN) contains connection information, such as user name and password, to connect through a database through an ODBC driver. 31Open Database Connectivity (ODBC) is an open standard application programming interface (API) for accessing databases.

Page 136: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

122 DataFlux Integration Server User's Guide

For more information, see Security Policy Planning.

Consider DIS Performance when Modifying Configuration Settings

Changes to several of the configuration settings in DIS can affect performance. For example, large temporary files, log files, and memory clusters can slow down the server. Several settings that are in the dfexec.cfg file or can be added to dfexec.cfg can alter memory allocation or processing power.

For more information, see Configuration Settings.

Page 137: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 123

Appendix B: Code Examples The following content instructs you on how to create and connect to the DataFlux® Integration Server (DIS). Zip files are available with files for the examples. The integrationserversamples.zip file is located in the DIS\[version] directory for Microsoft® Windows® operating system installations.

The DataFlux Web Service Definition Language (WSDL) file contains the set of definitions to describe the Web service. You can point directly to this file using either the directory path, such as C:\Program Files\DataFlux\DIS\[version]\share\arch.wsdl, or the URL, using the following syntax:

http://[servername]:port/?wsdl

Using an XML command, you can edit and view the arch.wsdl file that is installed on your DIS. Update the SOAP:address location to reflect the hostname and port number of the DIS. For example:

<SOAP:address location="http://localhost:21036"/>

Additionally, you can view the WSDL file via a web browser. From this view, the value of SOAP:address location will reflect your actual hostname and port number.

<SOAP:address location="http://[hostname]:21036

There are coding examples of these operations in each language listed below: Get Object List, Post Object, Delete Object, Get Architect Service Params, Execute Architect Service, Run Architect Job, Run Profile Job, Get Job Status, Get Job Log, Terminate Job, Clear Log.

"/>

Java Use wscompile, supplied with the Java™ Web Services Developer Pack, to build Java classes that wrap the DIS interface. This creates all of the classes required to interface with DIS for any application that has the ability to use these classes.

Following are examples using the Java classes constructed from the WSDL.

Examples

//////////////////////////////////////////////////////// // Imports //////////////////////////////////////////////////////// import arch.*; //////////////////////////////////////////////////////// // INITIALIZATION //////////////////////////////////////////////////////// ArchitectServicePortType_Stub stub; // get the stub stub =(ArchitectServicePortType_Stub)new DQISService_Impl()).getDQISService(); // optionally set to point to a different end point

Page 138: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

124 DataFlux Integration Server User's Guide

stub._setProperty(javax.xml.rpc.Stub.ENDPOINT_ADDRESS_PROPERTY, "http://MY_SERVER:PORT"); //////////////////////////////////////////////////////// // 1) Get Object List example //////////////////////////////////////////////////////// String[] res; res=stub.getObjectList(ObjectType.ARCHSERVICE); //////////////////////////////////////////////////////// // 2) Post Object example //////////////////////////////////////////////////////// byte[] myData; ObjectDefinition obj = new ObjectDefinition(); obj.setObjectName("NAME"); obj.setObjectType(ObjectType.fromString("ARCHSERVICE")); // read the job file in from the h/d myData = getBytesFromFile(new File(filename)); // post the job to the server String res=stub.postObject(obj, myData); //////////////////////////////////////////////////////// // 3) Delete Object //////////////////////////////////////////////////////// ObjectDefinition obj = new ObjectDefinition(); obj.setObjectName("MYJOB.dmc"); obj.setObjectType(ObjectType.fromString("ARCHSERVICE")); String res = stub.deleteObject(obj); //////////////////////////////////////////////////////// // 4) Get Architect Service Params //////////////////////////////////////////////////////// GetArchitectServiceParamResponse resp; FieldDefinition[] defs; resp=stub.getArchitectServiceParams("MYJOB.dmc",""); // Get Definitions for Either Input or Output defs=resp.getInFldDefs(); defs=resp.getOutFldDefs(); //Loop through Defs defs[i].getFieldName(); defs[i].getFieldType(); defs[i].getFieldLength(); //////////////////////////////////////////////////////// // 5) Execute Architect Service //////////////////////////////////////////////////////// FieldDefinition[] defs; DataRow[] rows; String[] row; GetArchitectServiceResponse resp; // Fill up the Field Definitions defs=new FieldDefinition[1]; defs[0] = new FieldDefinition(); defs[0].setFieldName("NAME"); defs[0].setFieldType(FieldType.STRING); defs[0].setFieldLength(15); // Fill up Data matching the definition

Page 139: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 125

rows = new DataRow[3]; row=new String[1]; row[0] ="Test Data"; rows[i] = new DataRow(); rows[i].setValue(row[0]); resp=stub.executeArchitectService("MYJOB.dmc", defs, rows, ""); // Get the Status, Output Fields and Data returned from the Execute Call String res = resp.getStatus(); defs=resp.getFieldDefinitions(); rows=resp.getDataRows(); // Output Field Definitions defs[i].getFieldName(); defs[i].getFieldType(); defs[i].getFieldLength(); // Output Data row=rows[i].getValue(); res=row[j]; //////////////////////////////////////////////////////// // 6) Run Architect Job //////////////////////////////////////////////////////// ArchitectVarValueType[] vals; vals=new ArchitectVarValueType[1]; vals[0]=new ArchitectVarValueType(); vals[0].setVarName("TESTVAR"); vals[0].setVarValue("TESTVAL"); // Returns JOBID String res=stub.runArchitectJob("MYJOB.dmc", vals, ""); //////////////////////////////////////////////////////// // 7) Run Profile Job //////////////////////////////////////////////////////// String res=stub.runProfileJob( "MYJOB.pfi", /* Job Name */ "", /* Output file to create (not used in this case) */ "repos", /* Repository name to write results to */ "New Report", /* Report name to create */ "Description", /* Description of run */ 0, /* Append to existing (false) */ vals, /* var/values */ "" /* reserved */ ); //////////////////////////////////////////////////////// // 8) Get Job Status //////////////////////////////////////////////////////// JobStatusDefinition[] defs; // if you wanted the status for a single job, you would // pass the jobid returned from runArchitectJob or runProfileJob defs=stub.getJobStatus(""); ObjectDefinition obj; obj=defs[i].getJob(); defs[i].getJobid();

Page 140: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

126 DataFlux Integration Server User's Guide

defs[i].getStatus(); obj.getObjectName() obj.getObjectType() //////////////////////////////////////////////////////// // 9) Get Job Log //////////////////////////////////////////////////////// GetJobLogResponseType resp; FileOutputStream fo; resp=stub.getJobLog(jobId,0); // write it to a file fo = new FileOutputStream (resp.getFileName()); fo.write(resp.getData()); fo.close(); //////////////////////////////////////////////////////// // 10) Terminate Job //////////////////////////////////////////////////////// String res=stub.terminateJob(jobId); //////////////////////////////////////////////////////// // 11) Clear Log //////////////////////////////////////////////////////// String res=stub.deleteJobLog(jobId);

C++ The client API consists of three header files and one .lib file. The headers include all necessary type enumerations. All required .dlls are provided within the dfPower® Studio installation. A connection handle should be initialized before use. and freed by the terminate function when no longer needed.

//////////////////////////////////////////////////////// // Imports //////////////////////////////////////////////////////// #include "arscli.h" #include "acjob.h" #include "acrta.h" Also requires arscli11.lib //////////////////////////////////////////////////////// // INITIALIZATION //////////////////////////////////////////////////////// acj_handle_t *pHandle = acj_initialize(sServer, nPort); //////////////////////////////////////////////////////// // DESTRUCTION OF HANDLE at end of use //////////////////////////////////////////////////////// acj_terminate(pHandle);

Page 141: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 127

//////////////////////////////////////////////////////// // ERROR MESSAGES //////////////////////////////////////////////////////// const char *err_code, *err_text, *err_detail; err_code = acj_get_error(pHandle, &err_text, &err_detail); //////////////////////////////////////////////////////// // 1) Get Object List example //////////////////////////////////////////////////////// int nNumJobs; char **job_list; job_list = acj_joblist(pHandle, RTARCHITECT /*ARCHITECT or PROFILE*/, &nNumJobs); //////////////////////////////////////////////////////// // 2) Post Object example //////////////////////////////////////////////////////// rc = acj_post_job(pHandle, "JOB_NAME", "FILE", RTARCHITECT/*ARCHITECT, PROFILE*/); //////////////////////////////////////////////////////// // 3) Delete Object //////////////////////////////////////////////////////// rc = acj_delete_job(pHandle, "JOB_NAME",RTARCHITECT/*ARCHITECT, PROFILE*/); //////////////////////////////////////////////////////// // 4) Get Architect Service Params //////////////////////////////////////////////////////// int nNumInputs, nNumOutputs; const char *err_code, *err_text, *err_detail; rc = acj_rt_io_info(pHandle, mJobName, &nNumInputs, &nNumOutputs); int i, nColSize; rta_data_type nColType; const char *sColName; rc = acj_rt_input_fieldinfo(pHandle, i, &sColName, &nColType, &nColSize); rc = acj_rt_output_fieldinfo(pHandle, i, &sColName, &nColType, &nColSize); //////////////////////////////////////////////////////// // 5) Execute Architect Service //////////////////////////////////////////////////////// // Set up the input columns int i, rc = 0; CString sColName; // This data is set when getting the parameter info int *mColSizes new int[nNumInputs]; rta_data_type *mColTypes = new rta_data_type[nNumInputs]; //Loop and Load inputs FIELD info rc = rta_set_infield_info(pHandle, i, sColName, mColTypes[i], mColSizes[i]); //Loop and Add the input data rc = rta_add_row(pHandle); //For Each row add all the data for the columns/fields rc = rta_set_data_value(pHandle, j, "VALUE"); // Run the test rc = rta_run(pHandle);

Page 142: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

128 DataFlux Integration Server User's Guide

// Get the number of output columns int nNumCols; nNumCols = rta_output_numfields(pHandle); // Get the output column information int nOutSize; rta_data_type nOutType; const char *sOutName; for (i = 0; i < nNumCols; i++) rc = rta_output_fieldinfo(pHandle, i, &sOutName, &nOutType, &nOutSize); // Get the number of output rows int nNumRows; nNumRows = rta_output_numrows(pHandle); // Get The output const char *sOutVal; for (i = 0; i < nNumRows; i++) for (j = 0; j < nNumCols; j++) sOutVal = rta_output_data(pHandle, j, i); acj_terminate(pHandle); //////////////////////////////////////////////////////// // 6) Run Architect Job //////////////////////////////////////////////////////// int rc; int mVarCount = 1; acj_arch_var_value *mVarArray; mVarArray = new acj_arch_var_value[mVarCount]; //LOAD ARRAY CString sTemp = "Test Data"; mVarArray[0].var_name = new char[sTemp.GetLength() + 1]; strcpy(mVarArray[0].var_name, sTemp); sTemp = "Test Value"; mVarArray[0].var_value = new char[sTemp.GetLength() + 1]; strcpy(mVarArray[0].var_value, sTemp); char sJobID[ACJ_JOBID_SIZE]; CString sJobName = "JOB_NAME"; acj_job_type nType = ARCHITECT; rc = acj_run_arch_job(pHandle, sJobName, mVarArray, mVarCount, sJobID); //////////////////////////////////////////////////////// // 7) Run Profile Job //////////////////////////////////////////////////////// int rc; int mVarCount = 1; acj_arch_var_value *mVarArray; mVarArray = new acj_arch_var_value[mVarCount]; //LOAD ARRAY CString sTemp = "Test Data"; mVarArray[0].var_name = new char[sTemp.GetLength() + 1]; strcpy(mVarArray[0].var_name, sTemp); sTemp = "Test Value"; mVarArray[0].var_value = new char[sTemp.GetLength() + 1];

Page 143: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 129

strcpy(mVarArray[0].var_value, sTemp); char sJobID[ACJ_JOBID_SIZE]; CString sJobName = "JOB_NAME"; // REPORT FILE rc = acj_run_prof_job(pHandle, sJobName, "FileName", 0, 1/*Append - 1, Truncate - 0*/, 0, "Description", mVarArray, 1, sJobID); // Repository rc = acj_run_prof_job(pHandle, sJobName, 0, "ReposName", 1/*Append - 1, Truncate - 0*/, "ReportName", "Description", mVarArray, 1, sJobID); //////////////////////////////////////////////////////// // 8) Get Job Status //////////////////////////////////////////////////////// int nNumStats; int rc = acj_get_job_status(pHandle, ""/*or "JobID"*/, &nNumStats); acj_job_type nType; char *sName, *sJobID, *sStatus; for (int i = 0; i < nNumStats; i++) rc = acj_get_job_status_item(pHandle, i, &sName, &sJobID, &nType, &sStatus); //////////////////////////////////////////////////////// // 9) Get Job Log //////////////////////////////////////////////////////// char sLogFile[MAX_PATH]; GetTempFileName(dfReadIniFile("Environment", "WorkingPath"), "ISM", 0, sLogFile) int rc = acj_get_job_log(pHandle, "JOBID", sLogFile); //////////////////////////////////////////////////////// // 10) Terminate Job //////////////////////////////////////////////////////// int rc = acj_terminate_job(pHandle, "JOBID"); //////////////////////////////////////////////////////// // 11) Clear Log ////////////////////////////////////////////////////////

int rc = acj_delete_job_log(pHandle, "JOBID");

C# Using the DataFlux WSDL file, import a web reference into your project. This builds the object required to interface with the DIS.

//////////////////////////////////////////////////////// // Imports

Page 144: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

130 DataFlux Integration Server User's Guide

//////////////////////////////////////////////////////// // Add Web reference using the DataFlux supplied WSDL //////////////////////////////////////////////////////// // INITIALIZATION //////////////////////////////////////////////////////// DQISServer.DQISService mService= new DQISServer.DQISService(); mService.Url = "http://MYDISSERVER" + ":" + "PORT"; //////////////////////////////////////////////////////// // 1) Get Object List example //////////////////////////////////////////////////////// string[] jobs; jobs=mService.GetObjectList(DQISServer.ObjectType.ARCHSERVICE); //////////////////////////////////////////////////////// // 2) Post Object example //////////////////////////////////////////////////////// DQISServer.ObjectDefinition def = new DQISServer.ObjectDefinition(); def.objectName = "MYJOB"; def.objectType = DQISServer.ObjectType.ARCHSERVICE; // Grab Bytes from a job file byte[] data = new byte[short.MaxValue]; FileStream fs = File.Open(@"c:\Develop\SoapUser\DISTESTRT.DMC", FileMode.Open, FileAccess.Read, FileShare.None); fs.Read(data,0,data.Length); DQISServer.SendPostObjectRequestType req= new DQISServer.SendPostObjectRequestType(); req.@object = def; req.data = data; mService.PostObject(req); //////////////////////////////////////////////////////// // 3) Delete Object //////////////////////////////////////////////////////// DQISServer.SendDeleteObjectRequestType req = new DQISServer.SendDeleteObjectRequestType(); DQISServer.ObjectDefinition def = new DQISServer.ObjectDefinition(); def.objectName = "MYJOB"; def.objectType = DQISServer.ObjectType.ARCHSERVICE; req.job = def; mService.DeleteObject(req); //////////////////////////////////////////////////////// // 4) Get Architect Service Params //////////////////////////////////////////////////////// DQISServer.GetArchitectServiceParamResponseType resp; DQISServer.SendArchitectServiceParamRequestType req; req=new DQISServer.SendArchitectServiceParamRequestType(); req.serviceName="MYJOB";

Page 145: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 131

resp=mService.GetArchitectServiceParams(req); string val; int i; DQISServer.FieldType field; // loop through this data val = resp.inFldDefs[0].fieldName; i = resp.inFldDefs[0].fieldLength; field = resp.inFldDefs[0].fieldType; val = resp.outFldDefs[0].fieldName; i = resp.outFldDefs[0].fieldLength; field = resp.outFldDefs[0].fieldType; //////////////////////////////////////////////////////// // 5) Execute Architect Service //////////////////////////////////////////////////////// DQISServer.SendArchitectServiceRequestType req = new DQISServer.SendArchitectServiceRequestType(); DQISServer.GetArchitectServiceResponseType resp; //////////////////////////////////////////////////////// DQISServer.GetArchitectServiceParamResponseType respParam; DQISServer.SendArchitectServiceParamRequestType reqParam; reqParam=new DQISServer.SendArchitectServiceParamRequestType(); reqParam.serviceName="ServiceName"; respParam=mService.GetArchitectServiceParams(reqParam); //////////////////////////////////////////////////////// DQISServer.FieldDefinition[] defs; DQISServer.DataRow[] data_rows; string[] row; defs=new DQISServer.FieldDefinition[respParam.inFldDefs.Length]; for(int i=0; i < respParam.inFldDefs.Length; i++) { // Fill up the Field Definitions defs[i] = new DQISServer.FieldDefinition(); defs[i].fieldName = respParam.inFldDefs[i].fieldName; defs[i].fieldType = respParam.inFldDefs[i].fieldType; defs[i].fieldLength = respParam.inFldDefs[i].fieldLength; } DataTable table = m_InputDataSet.Tables["Data"]; // externally provided data // Fill up Data matching the definition data_rows = new DQISServer.DataRow[Number of Rows]; for(int i=0;i < table.Rows.Count;i++) { System.Data.DataRow myRow = table.Rows[i]; row=new String[table.Columns.Count]; for(int c=0;c < table.Columns.Count;c++) { row[c] = myRow[c].ToString(); } // Loop and create rows of data to send to the service data_rows[i] = new DQISServer.DataRow(); data_rows[i].value = new string[table.Columns.Count];

Page 146: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

132 DataFlux Integration Server User's Guide

data_rows[i].value = row; } req.serviceName = "ServiceName"; req.fieldDefinitions = defs; req.dataRows = data_rows; resp=mService.ExecuteArchitectService(req); //////////////////////////////////////////////////////// // 6) Run Architect Job //////////////////////////////////////////////////////// DQISServer.SendRunArchitectJobRequest req = new DQISServer.SendRunArchitectJobRequest(); DQISServer.GetRunArchitectJobResponse resp; DQISServer.ArchitectVarValueType[] varVal = new DQISServer.ArchitectVarValueType[1]; varVal[0] = new DQISServer.ArchitectVarValueType(); varVal[0].varName = "TESTVAR"; varVal[0].varValue = "TESTVAL"; req.job = "JOB_NAME"; req.varValue = varVal; resp = mService.RunArchitectJob(req); string jobid = resp.jobId; //////////////////////////////////////////////////////// // 7) Run Profile Job //////////////////////////////////////////////////////// DQISServer.SendRunProfileJobRequestType req = new DQISServer.SendRunProfileJobRequestType(); DQISServer.GetRunProfileJobResponseType resp; req.jobName = "JOB_NAME"; req.reportName = "REPORT_NAME"; // use this: req.repositoryName = "REPOSNAME"; // or this: req.fileName = "FILE_NAME"; req.description = "DESCRIPTION"; req.append = 0;//No - 0; Yes - 1 resp = mService.RunProfileJob(req); string jobid = resp.jobId; //////////////////////////////////////////////////////// // 8) Get Job Status //////////////////////////////////////////////////////// DQISServer.SendJobStatusRequestType req = new DQISServer.SendJobStatusRequestType(); DQISServer.JobStatusDefinition[] resp;

Page 147: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 133

req.jobId = ""; resp = mService.GetJobStatus(req); DQISServer.ObjectDefinition def = resp[0].job; string jobid = resp[0].jobid; string jobstatus = resp[0].status; //////////////////////////////////////////////////////// // 9) Get Job Log //////////////////////////////////////////////////////// DQISServer.SendJobLogRequestType req = new DQISServer.SendJobLogRequestType(); DQISServer.GetJobLogResponseType resp; req.jobId = "SOMEJOBID"; resp = mService.GetJobLog(req); string fileName = resp.fileName; byte []data = resp.data; //////////////////////////////////////////////////////// // 10) Terminate Job //////////////////////////////////////////////////////// DQISServer.SendTerminateJobRequestType req = new DQISServer.SendTerminateJobRequestType(); DQISServer.GetTerminateJobResponseType resp; req.jobId = "SOMEJOBID"; resp = mService.TerminateJob(req); string fileName = resp.status; //////////////////////////////////////////////////////// // 11) Clear Log //////////////////////////////////////////////////////// DQISServer.SendDeleteJobLogRequestType req = new DQISServer.SendDeleteJobLogRequestType(); DQISServer.GetDeleteJobLogResponseType resp; req.jobId = "SOMEJOBID"; resp = mService.DeleteJobLog(req); string fileName = resp.status;

Page 148: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

134 DataFlux Integration Server User's Guide

Appendix C: Saving Profile Reports to a Repository In order for DataFlux® Integration Server (DIS) to store Profile reports in a repository, the profrepos.cfg configuration file must exist in the \etc directory of the DIS installation. If you have already configured a Profile repository on a Microsoft® Windows® machine running dfPower® Studio, you can copy the profrepos.cfg file from the \etc directory of dfPower® Studio to the \etc directory of your DIS installation.

The profrepos.cfg file contains the list of available repositories and specifies one of them to be the default. The format for the profrepos.cfg file is:

$defaultrepos='Repos1 Name' Repos1 Name='ODBC DSN' 'Table Prefix' Repos2 Name='ODBC DSN' 'Table Prefix' Repos3 Name='ODBC DSN' 'Table Prefix'

where:

$defaultrepos = the default repository. Repos1 Name = the user-defined name for the repository. ODBC DSN = the data source name (DSN32

The following example shows three repositories configured for Profile reports: TEST, dfPower Sample Prod, and dfPower Sample Cust. The two dfPower Sample examples are stored in the same database and use table prefixes, Prod_ and Cust_, to create a unique set of tables for each repository. The default repository is dfPower Sample Prod.

) defined for the ODBC connection. Table Prefix = the prefix that was given for the repository when it was created.

The following is an example of the profrepos.cfg file:

$defaultrepos='dfPower Sample Prod' TEST='TEST DATABASE' '' dfPower Sample Prod='DataFlux Sample' 'Prod_' dfPower Sample Cust='DataFlux Sample' 'Cust_'

Manage Profile repositories on any supported platform using the Profile Repository Administrator, which can be accessed from the Start

For more information on creating and maintaining Profile repositories, see the

menu of any DataFlux dfPower Studio for Windows installation.

dfPower Studio Online Help topic,

32A data source name (DSN) contains connection information, such as user name and password, to connect through a database through an ODBC driver.

"dfPower Profile - Profiling Repositories."

Page 149: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 135

Appendix D: SOAP Commands SOAP Commands DataFlux® Integration Server (DIS) supports commands using SOAP33

Enumeration Values

. Many of these commands have pre-defined enumeration values that are used in conjunction with various commands. These enumeration values are listed, and their association with the various SOAP commands is shown in the commands table.

These are the options that are pre-defined for many of the SOAP commands listed in the SOAP Commands Table.

ObjectType

• ARCHSERVICE

• ARCHITECT

• PROFILE

FieldType

• UNKNOWN

• BOOLEAN

• INTEGER

• REAL

• STRING

• DATE

AccountType

• USER

• GROUP

Other commands require string input, and are so identified.

33Simple Object Access Protocol (SOAP) is a Web service protocol used to encode requests and responses to be sent over a network. This XML-based protocol is platform independent and can be used with a variety of internet protocols.

Page 150: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

136 DataFlux Integration Server User's Guide

Following are the commands recognized by DIS:

Command Description

AddToGroup Add an account (user or group) to a group.

DeleteFromGroup Remove an account (user or group) from a group.

DeleteJobLog Deletes a job log and status. This command essentially removes any reference DIS has for this job, so the client cannot make any queries for this job after making this call. This command works only for jobs that have completed.

DeleteObject Deletes an existing object of a particular type from the server.

ExecuteArchitectService Runs an Architect service. The client stays connected until DIS sends back a status response and result data.

GetArchitectServiceParams Gets required input and produces output fields of an Architect service.

GetArchitectServicePreload Preload a service or list of services.

GetArchitectServiceUnload Unload a service or list of services.

GetJobLog Returns the log file for a Profile or Architect job that was previously started. The job does not have to be finished for this command to work.

GetJobStatus Returns the status of a Profile or Architect job that was previously started, or can be used to return status for all jobs if a Job ID is not passed in.

GetLoadedObjectList Retrieve a list of currently loaded services.

GetObjectsList Returns a list of objects of a particular object type. The types are: Architect real-time services, Architect jobs, and Profile jobs.

GetObjFile Get an object file (a job or a service).

ListAccounts Retrieve a list of user or group accounts.

ListGroupMembers Retrieve a list of accounts (user or group) in a group.

ListUserGroups Retrieve a list of groups that a user belongs to.

MaxNumJobs Dynamically set the max number of services allowed to run concurrently.

PostObject Posts a new object to the server of an object type: service/arch-job/prof-job. If such a service/job already exists, the client will get an error.

RepositoryList Get a list of Profile repositories.

RunArchitectJob Starts running an Architect job. The client has to make new connections to check on its status.

RunProfileJob Starts running a Profile job. The client has to make new connections to check on its status.

ServerVersion Get the server version.

TerminateJob Terminates a running job. The client can still get the status and log after the job has been terminated.

Page 151: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 137

Appendix E: DIS Service DataFlux® Integration Server (DIS) runs as a Microsoft® Windows® service (called DataFlux Integration Server). You can start and stop the service using the Microsoft Management Console (MMC34

In UNIX, DIS runs as a daemon administered from a command line. The disadmin application is used to start and stop the daemon. Real-time services, Architect, and Profile jobs associated with DIS are administered through the dfPower environment.

). DIS uses the DataFlux dfPower® Studio execution environment for Windows to execute real-time services, as well as Architect and Profile jobs.

Windows

Starting and Stopping DIS in Windows

When installed in a Microsoft Windows environment, DIS runs as a Windows service (named DataFlux Integration Server).

Start and stop the service using the MMC. The MMC hosts administrative tools that you can use to administer networks, computers, services, and other system components.

1. Click Start > Settings >

2. Double-click

Control Panel.

Administrative Tools > Computer Management

3. Expand the

. This brings up the MMC.

Services and Applications

4. Click

folder.

Services

5. Click

.

DataFlux Integration Server

6. Click either

.

Stop the service or Restart the service

Modifying DIS Windows Service Log On

.

When DIS is installed, it creates a service named DataFlux Integration Server. By default, this service is started using the local system account.

Note:

34The Microsoft Management Console (MMC) is an interface new to the Microsoft Windows 2000 platform which combines several administrative tools into one configurable interface.

Because this account may have some restrictions (such as accessing network drives) we suggest that you modify the service properties to have the service log on using a user account with the appropriate privileges, such as access to required network drives and files. For security reasons, you should assign administrative privileges only if necessary.

Page 152: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

138 DataFlux Integration Server User's Guide

To modify the DIS log on:

1. Select Control Panel > Administrative Tools

2. Double-click

.

Services, and select the DataFlux Integration Server

3. Select the

service.

Log On tab, select This account, and enter Account and Password

UNIX

credentials for a user with administrative privileges.

Starting and Stopping DIS Daemon in UNIX/Linux

Start and stop the daemon using the disadmin application included in the installation. This application can be run using the command-line command: ./bin/disadmin [yourcommand

[yourcommand] should be one of the following:

] from the installation root directory.

Command Description

start Starts the Integration Server.

stop Stops the Integration Server.

status Checks that the Integration Server is running.

help Displays this message.

version Displays the version information.

For example:

./bin/disadmin start — Starts the server

./bin/disadmin stop — Stops the server

Page 153: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 139

Appendix F: Configuration Settings The following is a list of DataFlux® Integration Server (DIS) and dfArchitect configuration settings or directives. These may need to be modified prior to running the server. Some examples of the settings can be found in the dfexec.cfg configuration file.

• General DIS Configuration Directives

• DIS Security Related Configuration Directives

• Architect Configuration Directives

• Data Access Component Directives

Best Practice: Refer to Appendix A: Best Practices - Consider DIS Performance when Modifying Configuration Settings for additional information about Configuration Settings.

General DIS Configuration Directives The following table lists the configuration settings for DIS:

Setting Description

arch job path Refers to the location of the dfPower Architect batch job files. If a value is not set here, it will default to a new directory, arch_job, created under the working directory (Integration Server). All values containing special characters or spaces must be enclosed in single quotes.

# Windows Example arch job path = 'C:\Program Files\DataFlux\DIS\[version]\arch_job' # UNIX Example arch job path = '/opt/dataflux/aix/[version]/dfpower/etc/arch_job'

Page 154: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

140 DataFlux Integration Server User's Guide

Setting Description

arch svc path Sets the path to Architect real-time services. If not configured, it will default to a new directory, svc_job, created under the working directory (Integration Server). All values containing special characters or spaces must be enclosed in single quotes.

# Windows Example arch svc path = 'C:\Program Files\DataFlux\DIS\[version]\svc_job' # UNIX Example arch svc path = '/opt/dataflux/aix/DIS/[version]/svc_job'

dfexec max num

Maximum number of dfexec processes that can run simultaneously. The default is 10 (Integration Server).

# Windows or UNIX Example dfexec max num = 5

dfexec exe path Path to the dfexec.cfg executable. It defaults to the bin directory (Integration Server).

# Windows Example dfexec exe path = C:\Program Files\DataFlux\dis\[version]\bin # UNIX Example dfexec exe path = /opt/dataflux/aix/dis/[version]/bin

dfsvc max errs Sets the maximum number of errors that a dfwsvc process is allowed to encounter before it is terminated by DIS. The default is -1, which disables the function and sets no limit on the number of errors. Any number less than one will default to -1.

# Windows or UNIX Example dfsvc max errs = 10

dfsvc max requests

Sets the number of requests a dfwsvc process can handle before being terminated by DIS. The default is -1, meaning no limit is set. If a number less than one is entered, the value defaults to -1.

# Windows or UNIX Example dfsvc max requests = 5

dfsvc preload Designates specific services and the count for each service that DIS preloads during startup. This can be used in conjunction with dfsvc preload all. For more information and formatting guidelines, see Pre-loading Services.

# Windows or UNIX Example dfsvc preload = 2:svc_1.dmc -1:subdir1\svc_2.dmc -3:svc_3.dmc

dfsvc preload Causes DIS to find and preload all services a specified number of times.

Page 155: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 141

Setting Description

all This includes services found in subdirectories. The number of instances specified must be an integer greater than zero, or the directive is ignored. This can be used in conjunction with dfsvc preload. For more information, see Pre-loading Services.

# Windows or UNIX Example dfsvc preload all = 2

dfwsvc debug Specifies whether dfwsvc should run in debug mode. If set to yes, dfwsvc always creates a log file regardless of the dfwsvc log setting. The default is no.

# Windows or UNIX Example dfwsvc debug = yes

dfwsvc exe path Specifies the path to the dfwsvc executable. If not set, it defaults to the directory containing the DIS executable. The dfwsvc executable is used when WLP DIS client access is required. The server child listen and server wlp listen options must be specified as well.

# Windows Example dfwsvc exe path = C:\Program Files\DataFlux\DIS\[version]\bin # UNIX Example dfwsvc exe path = /opt/dataflux/aix/dis/[version]/bin

dfwsvc log Specifies whether dfwsvc should create a log file. The default is no.

# Windows or UNIX Example dfwsvc log = yes

dfwsvc max num

Specifies the maximum number of dfwsvc instances, or Architect services, that may be running at any given time.

# Windows or UNIX Example: dfwsvc max num = 3

job io path Directory containing Architect and Profile job input and output files. If a job does not specify a path to input or output files, DIS will use the directory specified with this configuration. If not set, the default location will be a new directory created under the working directory. All values containing special characters or spaces must be enclosed in single quotes.

# Windows Example job io path = 'C:\Program Files\DataFlux\DIS\[version]\temp' # UNIX Example job io path = '/opt/dataflux/linux/dis/[version]/temp'

Page 156: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

142 DataFlux Integration Server User's Guide

Setting Description

log dir This is the log path for the DIS, dfsvc, and dfexec.cfg log files. If not set, it defaults to the directory containing the DIS executable in Microsoft® Windows® or to $DFEXEC_HOME/etc/ in UNIX® (Integration Server). All values containing special characters or spaces must be enclosed in single quotes.

# Windows Example log dir = 'C:\Program Files\DataFlux\DIS\[version]\log' # UNIX Example log dir = '/opt/dataflux/dis/[version]/log'

log get_objects_list requests

Controls whether or not to log object list requests (Integration Server). The default is 0. Some clients may automatically send certain types of requests which result in an unnecessarily long and difficult to read log file. The following directive can instruct a server not to log particular requests.

Note:

# Windows or UNIX Example

Setting this option to yes may cause large log files to be generated.

log get_objects_list requests = 1 [1=yes/0=no]

log get_status requests

Controls whether or not to log status requests (Integration Server). The default is zero. Some clients may automatically send certain types of requests which result in an unnecessarily long and difficult to read log file. The following can instruct a server not to log particular requests.

Note:

# Windows or UNIX Example

Setting this option to yes may cause large log files to be generated.

log get_status requests = 1 [1=yes/0=no]

odbc ini Where the odbc.ini file is stored (Architect batch jobs, Profile jobs, Integration Server).

# Windows Example odbc ini = C:\Windows # UNIX Example odbc ini = /opt/dataflux/solaris

priv log packets Controls whether DIS generates a log file of all SOAP packets that are sent and received. The generated file will have the word packets

# Windows or UNIX Example

in the name and reside in the same directory as the DIS log file. Due to a decrease in system performance while enabled, this setting should be disabled unless necessary while debugging. It is disabled by default.

priv log packets = no

Page 157: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 143

Setting Description

prof job path Path for Profile batch jobs (Integration Server). If not specified, a new directory will be created under the working directory by default. All values containing special characters or spaces must be enclosed in single quotes.

# Windows Example prof job path = 'C:\Program Files\DataFlux\DIS\[version]\prof_job' # UNIX Example prof job path = '/opt/dataflux/solaris/dis/[version]/profjob'

server child listen

Defines the communication method between dfwsvc and DIS. This option takes precedence over the server child shm dir and server child tcp port options. If you do not configure this option, it will default to the shared memory provider and the shared memory files will be created in the work directory.

# Windows Example server child listen = 'type=shm;path=C:\Program Files\DataFlux\DIS\[version]\var'

server child shm dir

Directory in which to create SHM files. This is used when server child listen is not set and the default method is SHM. If the default method is not SHM, do not specify this option, as it will select the SHM method. This option takes precedence over the server child tcp port option. The default directory is the DIS work directory.

# Windows Example server child shm dir = 'C:\Program Files\DataFlux\DIS\[version]\var'

server child tcp port

Port number for TCP on localhost. This is used when server child listen is not set and the default method is TCP. If the default method is not TCP, do not specify this option, as it will select the TCP method. The default port number is 21035.

# Windows or UNIX Example server child tcp port = 21035

server listen port

Selects the port that the server is to use. The default is port 21036 (Integration Server).

# Windows or UNIX Example server listen port = 20125

server log file max size

Maximum size of a log file at which it will be rotated out and saved with an index included in its name. An empty log file with the original name will be created and used for further logging. The default is 0, meaning no log file rotation.

# Windows or UNIX Example

Page 158: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

144 DataFlux Integration Server User's Guide

Setting Description

server log file max size = 512MB

server read timeout

Indicates the time to wait for server-client read/write operations to complete. A positive value indicates seconds. A negative value indicates microseconds. The default value is 0.5 seconds (Integration Server).

Note:

# Windows or UNIX Example server read timeout = -300000

If errors are encountered while uploading jobs or services, try increasing this value.

The server read timeout example above sets a limit of 0.3 seconds.

server send log chunk size

Indicates the size of log file chunks to send when a client requests a log file from DIS. The default is 512KB (Integration Server).

Note:

# Windows or UNIX Example

By breaking up the log file into chunks of a specified size, DIS is able to respond quicker to other requests while sending the log file.

server send log chunk size = 256KB

server wlp listen

Identifies the connection information that clients will use to contact the WLP listen server when WLP DIS client or SAS WLP Batch client access is required. The svr run wlp parameter must be set to yes

# Windows or UNIX Example

for this option to take effect.

server wlp listen = 'type=tcp;host=myhost.mycompany.com;port=21037'

server wlp listen port

Port number for TCP on the host machine. This is used when server wlp listen is not set and server run wlp is enabled. The default is 21037.

# Windows or UNIX Example server wlp listen port = 21037

soap return nulls

Determines whether DIS will return an empty string instead of null when the real-time service output field has a null value. The default value is no. If this parameter is set to yes

svc requests queue

, DIS will return null, instead of an empty string.

Enables service requests to be queued in the order they were received and to be processed as soon as a dfwsvc becomes available. If not set, when the maximum number of dfwsvc (processes handling real-time service requests) are reached, an error will occur, stating the request cannot be handled.

# Windows or UNIX Example svc requests queue = yes

Page 159: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 145

Setting Description

Note:

svr idle thread timeout

This directive should be treated as an advanced configuration and should only be used when troubleshooting performance problems.

Determines the length of time an existing thread remains idle before it is terminated. Defaults to five seconds if not set, or if set to less than one microsecond.

# Windows or UNIX Example svr idle thread timeout = 5

Note:

svr max idle threads

This directive should be treated as an advanced configuration, used only when troubleshooting performance problems.

Determines the maximum number of idle threads that are allowed to exist. It will always be at least one.

# Windows or UNIX Example svr max idle threads = 1

Note:

svr max path len

This directive should be treated as an advanced configuration, used only when troubleshooting performance problems.

Sets the maximum number of bytes that are allowed when entering a path as part of a configuration directive. If a longer path is entered, DIS will not initialize. The default is 8KB.

# Windows or UNIX Example

svr max threads

svr max path len = 8KB

Determines the maximum number of threads that exist. If a WLP server is to run, at least two threads are used. If a SOAP server is to run, at least four threads are used. DIS automatically adjusts the value to the required minimum if the configured value is too low.

# Windows or UNIX Example svr max threads = 2

Determines if the SOAP (DIS) server is running. The default is one. svr run dis

# Windows or UNIX Example

svr run wlp

svr run dis = 1 [1=yes/0=no]

Determines whether the WLP server is running. The default is zero. When set to one, DIS starts the WLP listen specified in the server wlp listen parameter.

Page 160: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

146 DataFlux Integration Server User's Guide

Setting Description

# Windows or UNIX Example svr run wlp = 0 [1=yes/0=no]

working path Path where the server creates temporary input/output files to transfer to the real-time service. The default path is to the directory containing the DIS executable in Windows and $DFEXEC_HOME/var/ in UNIX (Integration Server). The value must be enclosed in single quotes. Location of the working path can affect system performance.

# Windows Example working path = 'C:\Program Files\DataFlux\DIS\[version]\work' # UNIX Example working path = '/opt/dataflux/solaris/dis/[version]/work'

DIS Security Related Configuration Directives The following table lists the security settings for DIS:

Setting Description

allow everyone

Allows all users in the present and future to have access to all objects. This setting is part of the DataFlux Integration Server security subsystem (Integration Server) and defaults to zero.

# Windows or UNIX Example allow everyone = 0 [1=yes/0=no]

default commands permissions

Users who do not exist on DIS are automatically added using the default command permissions the first time accessed. The DIS administrator can later change these permissions using the following directive in the dfexec.cfg file.

# Windows or UNIX Example

In this example, the default grants all permissions using all 1s. Configure the permissions as needed.

default commands permissions = 1111111111111

default umask

Changes the user file-creation mode mask or umask

# Windows or UNIX Example

, which is the set of default permissions for created files. If not specified, the umask defaults to the shell's umask (output of the umask command). Users that need to change file permissions from the default can specify a different umask number.

# remove write bit for others (create 0664 files) default umask = 002

Page 161: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 147

Setting Description

# remove all bits for others (create 0660 files) default umask = 007 # remove write bit for group and others (create 0600 files) default umask = 022

enable ldap Configuration setting in the dfexec.cfg file that enables LDAP in DIS.

# Windows or UNIX Example

enable ownership

enable ldap = yes

Enables ownership rights for objects posted by users. The default is yes

# Windows or UNIX Example

(Integration Server). Ownership implicitly allows a user to execute or delete an object, regardless of explicitly declared permissions.

enable ownership = 1 [1=yes/0=no]

enable security

Enables the DIS security subsystem. The default is zero. Three related settings are ignored unless this directive is enabled: enable ownership, allow everyone, and security path

# Windows or UNIX Example enable security = 1 [1=yes/0=no]

(Integration Server).

restrict general access

Restricts general access by IP address. Access to DIS is restricted by specifying IP addresses of clients that are allowed or denied access. The default is allow all. Other options include: deny all, allow none, and deny none (Integration Server). When configuring each restriction group, either allow or deny must be specified with the directive. The two cannot be used together. The directive can then be followed by lists of specific IP addresses and ranges of IP addresses. IP address ranges must be indicated with a dash and no spaces. Each individual address or range of addresses must be separated with a space. Each group must be entered on a single line. If the keywords all or none

# Windows or UNIX Example

are used, explicitly defined IP addresses or ranges are ignored. A client that is denied general access is implicitly denied access to post and delete commands. Only IPv4 addresses are supported.

restrict general access = deny all # Windows or UNIX Example # The lines below allow access for addresses 127.0.0.1 and 192.168.1.1 # through 192.168.1.255 restrict general access = allow 127.0.0.1 192.168.1.1-192.168.1.255 # Windows or UNIX Example # The lines below allow access for addresses 127.0.0.1 and # 192.168.1.190 restrict general access = allow 127.0.0.1 192.168.1.190

Page 162: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

148 DataFlux Integration Server User's Guide

Setting Description

restrict get_all_stats access

This security setting allows control over which clients can request status of all jobs. The default is allow all (Integration Server). See configuration setting restrict general access for more information.

Note:

# Windows or UNIX Example

When the status of all jobs is requested, the user will receive job IDs. It is recommended that access be limited to administrators.

restrict get_all_stats access = deny none

restrict post/delete access

This configuration setting allows control over which clients can post and delete jobs. The default is allow all (Integration Server). See configuration setting restrict general access for more information.

# Windows or UNIX Example restrict post/delete access= allow none

security path Directory containing DIS security files. If not set, it defaults to the dis_security directory that must exist under the directory containing the server's configuration file (Integration Server).

# Windows Example security path = C:\Program Files\DataFlux\DIS\[version]\etc\dis_security # UNIX Example security path = /opt/dataflux/aix/dis/[version]/etc/security

soap over ssl Configuration setting in the dfexec.cfg file that enables SSL to be used with DIS. Servers will need to be configured for SSL. Clients can then communicate over SSL when the server's address is entered as https:// instead of http://.

# Windows or UNIX Example

soap ssl key file

soap over ssl = yes

Path to the key file that is required when the SOAP server must authenticate to clients. If this configuration directive is not used, comment it out.

# Windows or UNIX Example

soap ssl key passwd

soap ssl key file = 'path to file'

Password for the soap ssl key file. If the key file is not password protected, this configuration directive should be commented out.

# Windows or UNIX Example

soap ssl CA cert file

soap ssl key passwd = 'encrypted password'

File where the Certificates Authority stores trusted certificates. If this

Page 163: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 149

Setting Description

configuration directives is not needed, comment it out.

# Windows or UNIX Example

soap ssl CA cert path

soap ssl CA cert file = 'path to file'

Path to the directory where trusted certificates are stored. If this configuration directive is not needed, comment it out.

# Windows or UNIX Example

strong passwords

soap ssl cert path = 'path to directory'

Setting used by the disadmin application in UNIX to enforce the following rules for passwords:

• minimum length of six characters

• requires at least one number

• requires at least one uppercase letter

• requires at least one lowercase letter

This setting affects the following disadmin commands: adduser, moduser, and passwd. See Security Commands for UNIX for more information about using the disadmin application and use of strong passwords.

# UNIX Example strong passwords = yes

Architect Configuration Directives The following table lists the Architect configuration settings:

Setting Description

arch config This path indicates the location of the Architect macro definitions file. If not set, this value defaults to \etc\architect.cfg (Architect batch jobs and real-time services).

# Windows Example arch config = C:\Program Files\DataFlux\dfPower Studio\[version]\etc\architect.cfg # UNIX Example arch config = /opt/dataflux/aix/[version]/dfpower/etc/architect.cfg

canada post db

This setting indicates the path to the Canada Post database for Canadian address verification (Architect batch jobs and real-time services).

# Windows Example

Page 164: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

150 DataFlux Integration Server User's Guide

Setting Description

canada post db = C:\Program Files\DataFlux\dfPower Studio\[version]\mgmtrsrc\RefSrc\SERPData # UNIX Example canada post db = /opt/dataflux/aix/dfpower/[version]/mgmtrsrc/refsrc/ serpdata

checkpoint Sets the minimum time between log checkpoints, allowing control of how often the log file is updated. Add one of the following to indicate the unit of time: h, min, s (Architect batch jobs and Profile jobs).

# Windows or UNIX Example checkpoint = 15min

cluster memory

Cluster memory is the amount of memory to use per cluster of match-coded data. Use this setting if you are using clustering nodes in dfPower (Architect batch jobs and real-time services). This setting may affect memory allocation.

Note:

# Windows or UNIX Example

This setting must be entered in megabytes, for example, 1 GB should be set to 1024 MB.

cluster memory = 64MB

copy qas files

When set to yes, the QAS config address verification files are copied to the current directory if they are new. The setting defaults to no (Architect batch jobs).

# Windows or UNIX Example copy qas files = yes

datalib path

This is the path to the verify data libraries (Architect batch jobs and real-time services), excluding USPS data. All values containing special characters or spaces must be enclosed in single quotes.

# Windows Example datalib path = 'C:\Program Files\DataFlux\DIS\[version]\data' # UNIX Example datalib path = '/opt/dataflux/hpux/dis/[version]/data'

dfclient config

Sets the path for the dfIntelliServer® client configuration file, if using dfIntelliServer software. The client can be local or loaded on another machine (Integration Server, dfIntelliServer). This setting is necessary if using distributed nodes in an Architect job.

# Windows Example dfclient config = C:\Program Files\DataFlux\dfIntelliServer\etc\dfclient.cfg # UNIX Example dfclient config =

Page 165: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 151

Setting Description

/opt/dataflux/solaris/dfintelliserver/etc/dfclient.cfg

enable dpv To enable Delivery Point Validation (DPV35) processing for US Address Verification, set to yes

# Windows or UNIX Example

. It is disabled by default (Architect batch jobs and real-time services).

enable dpv = yes

enable elot To enable USPS eLOT processing for US Address Verification, set to yes

# Windows or UNIX Example

. It is disabled by default (Architect batch jobs and real-time services).

enable elot = yes

enable lacs To enable Locatable Address Conversion System (LACS36) processing, set to yes

# Windows or UNIX Example

. It is disabled by default (Architect batch jobs and real-time services).

enable lacs = yes

enable rdi Enables Residential Delivery Indicator (RDI37) processing for US Address Verification. The default is no

# Windows or UNIX Example

(Architect batch jobs and real-time services).

enable rdi = yes

fd table memory

Sets the memory size for calculating frequency distribution. If this is not set, a default value of 262,144 bytes will be used on 32-bit systems and 524,288 on 64-bit systems. This memory refers to the number of bytes used per field while processing a table. When processing tables with many fields, this number may be reduced to alleviate memory issues. The larger the value, the more efficient the calculation will be. A minimum value of 4096 bytes exists (8192 on 64 bit systems).

Note:

# Windows or UNIX Example

This is a separate parameter from the frequency distribution memory cache size that is specified on a per job basis.

fd table memory = 65536

35Delivery Point Validation (DPV) is a USPS database that checks the validity of residential and commercial addresses. 36Locatable Address Conversion System (LACS) is used updated mailing addresses when a street is renamed or the address is updated for 911, usually by changing a rural route format to an urban/city format. 37Residential Delivery Indicator (RDI) identifies addresses as residential or commercial.

Page 166: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

152 DataFlux Integration Server User's Guide

Setting Description

ftp get command

Used to receive files by FTP. During the DIS installation, the operating system is scanned for the following FTP utilities: NcFTP, Perl LWP Modules, cURL, and Wget. If multiple utilities are found, NcFTP and Perl LWP Modules are given precedence and FTP get/put commands are written to the dfexec.cfg file.

# Windows or UNIX Example ftp get command = '"C:\Program Files\NcFTP\ncftpget.exe" -d %L -u %U -p %P %S %T %F'

ftp put command

Used to send files by FTP. During the DIS installation, the operating system is scanned for the following FTP utilities: NcFTP, Perl LWP Modules, cURL, and Wget. If multiple utilities are found, NcFTP and Perl LWP Modules are given precedence and FTP get/put commands are written to the dfexec.cfg file.

# Windows or UNIX Example ftp put command = '"C:\Program Files\NcFTP\ncftpput.exe" -d %L -u %U -p %P %S %T %F'

geo db Sets the path to the database used for geocoding and coding telephone information (Architect batch jobs and real-time services).

# Windows Example geo db = C:\Program Files\DataFlux\dfPower Studio\[version]\mgmtrsrc\RefSrc\GeoPhoneData # UNIX Example geo db = /opt/dataflux/hpux/dfpower/[version]/mgmtrsrc/fresrc/ geophonedata

java classpath

Setting used for the Java™ Plugin that indicates the location of compiled Java code.

# Windows Example java classpath = \usr\java14_64\jre\bin # UNIX Example java classpath = /usr/java14_64/jre/bin

java debug Optional Java Plugin setting that enables debugging in the Java Virtual Machine (JVM™) used by Architect or Integration Server. The default setting is no.

# Windows or UNIX Example java debug = yes

java debug port

Optional Java Plugin setting that indicates the port number where the JVM listens for debugger connect requests. This can be any free port on the machine.

# Windows or UNIX Example java debug port = 23017

Page 167: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 153

Setting Description

java vm This Java Plugin setting references the location of the JVM DLL (or shared library on UNIX variants).

# Windows Example java vm = [JRE install directory]\bin\server\jvm.dll # UNIX Example java vm = /[JRE install directory]/bin/server/jvm.dll

license location

This is the license directory containing the license file (Architect batch jobs, real-time services, and Profile jobs). It was labeled license dir in previous versions. All values containing special characters or spaces must be enclosed in single quotes.

Caution: License location is only valid for UNIX. In Windows, set or change the license location using the License Manager. To access the License Manager application click Start > Programs > DataFlux Integration Server > License Manager

# UNIX Example

.

license location = '/opt/dataflux/dis/[version]/etc'

mail command

This command is used for sending alerts by email (Profile jobs). The command may contain the substitutions %T (To) and %B (Body). %T will be replaced with the destination email address and %B with the path of a temporary file containing the message body. If %T and %B are left blank, these fields default to what was specified in the job. The -s mail server parameter specifies the mail server and is not necessary on UNIX systems. All values containing special characters or spaces must be enclosed in single quotes. Sendmail is the open source program in UNIX used for sending mail. In Windows, mail is sent by the vbscript mail.vbs.

# Windows Example (where mail server is named mailhost) mail command = 'cscript -nologo "%DFEXEC_HOME%\bin\mail.vbs" -s mailhost "%T" < "%B"' # UNIX Example mail command = '/usr/lib/sendmail %T < %B'

odbc ini Where the odbc.ini file is stored (Architect batch jobs, Profile jobs, Integration Server).

# Windows Example odbc ini = C:\Windows # UNIX Example odbc ini = /opt/dataflux/solaris

Page 168: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

154 DataFlux Integration Server User's Guide

Setting Description

plugin dir Where Architect plug-ins are located (Architect batch jobs and real-time services, Profile jobs).

# Windows Example plugin dir = C:\Program Files\DataFlux\dis\[version]\bin # UNIX Example plugin dir = /opt/dataflux/aix/dis/[version]/bin

qkb root Location of the Quality Knowledge Base (QKB) files. This location must be set if using steps that depend on algorithms and reference data in the QKB, such as matching or parsing (Architect batch jobs and real-time services, Profile jobs).

Note:

# Windows Example

If changes are made to the QKB make sure the server copy is updated as well.

qkb root = C:\Program Files\DataFlux\qkb # UNIX Example qkb root = /opt/dataflux/qkb

repository config

Location of the Profile repository config file (Profile jobs and Integration Server). All values containing special characters or spaces must be enclosed in single quotes.

# Windows Example repository config = 'C:\Program Files\DataFlux\DIS\[version]\etc\profrepos.cfg' # UNIX Example repository config = '/opt/dataflux/linux/dis/[version]/etc/profrepos.cfg'

sort chunk Allows you to specify the amount of memory to use while performing sorting operations. The amount may be given in KB or MB, but not GB (Architect batch jobs and real-time services).

# Windows or UNIX Example sort chunk = 128MB

usps db This is the path to the USPS database required for US address verification (Architect batch jobs and real-time services).

# Windows Example usps db = C:\Program Files\DataFlux\verify\uspsdata # UNIX Example usps db = /opt/dataflux/aix/verify/uspsdata

Page 169: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 155

Setting Description

verify cache

Indicates an approximated percentage (0 - 100) of the USPS reference data set that will be cached in memory prior to an address verification procedure. (Architect batch jobs and real-time services). This setting can affect memory allocation.

# Windows or UNIX Example verify cache = 30

verify preload

Allows you to specify a list of states whose address data will be preloaded. Preloading increases memory usage, but significantly decreases the time required to verify addresses in a state (Architect batch jobs and real-time services).

# Windows or UNIX Examples verify preload = NY TX CA FL verify preload = ALL

world address db

Sets the path where AddressDoctor data is stored.

# Windows Example world address db= 'C:\world_data\' # UNIX Example world address db= '/opt/dataflux/linux/worlddata'

world address license

The license key provided by DataFlux used to unlock AddressDoctor country data. The value must be enclosed in single quotes (Architect batch jobs and real-time services).

# Windows or UNIX Example world address license = 'abcdefghijklmnop123456789'

Data Access Component Directives The following table lists the settings in the app.cfg file, which are used by the DAC to determine the operation it will perform:

Page 170: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

156 DataFlux Integration Server User's Guide

Setting Description

DAC Logging Specifies whether or not to create a log file for DAC operations.

The DAC checks the following values and locations, based on your operating system:

Windows — The USER\logfile configuration value. Next, the DAC checks SYSTEM\logfile for a string representing a log file name.

UNIX — The sql_log.txt

User saved connection

file in the current working directory.

Specifies where to find user saved connections.

The DAC checks the following values and locations, based on your operating system:

Windows — The USER\savedconnectiondir configuration value. Next, the DAC checks the application settings directory for the user, which is usually in the \Documents and Settings directory, in the DataFlux\dac\[version]

UNIX — The

subdirectory.

$HOME/.dfpower/dsn

System saved connection

directory.

Specifies where to find system saved connections.

The DAC checks the following values and locations, based on your operating system:

Windows — The DAC/SAVEDCONNSYSTEM configuration value. Next, the DAC checks the DFEXEC_HOME environment variable, in the $DFEXEC_HOME\etc\dsn

UNIX — The

directory.

$DFEXEC_HOME/etc/dsn

Use braces

directory.

Specifies whether to enclose DSN items with braces when they contain reserved characters.

The DAC checks the following values and locations, based on your operating system:

Windows — The USER\[dsn_name]\usebraces has a double word value of 1, where [dsn_name] is the name of the DSN. Next, the DAC will check the SYSTEM\[dsn_name]\usebraces value.

UNIX — The $HOME/.dfpower/dsn.cfg file for [dsn_name] = usebraces.

Page 171: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 157

Setting Description

Oracle NUMBER(38) handling

Specifies whether to treat NUMBER (38) columns as an INTEGER (which is the default) or a REAL value. This setting applies if Oracle is the only driver to which you are connecting.

The DAC checks the following values and locations, based on your operating system:

Windows — The USER\[dsn_name]\oranum38real has a double word value of 1. Next, the DAC checks that SYSTEM\[dsn_name]\oranum38real has a double word value of 1.

UNIX — The $HOME/.dfpower/dsn.cfg

Suffix for CREATE TABLE statements

file for [dsn_name] = oranum38real.

Specifies a string that is appended to every CREATE TABLE statement. If you include %t

The DAC checks the following values and locations, based on your operating system:

in this string, it is substituted with the table name.

Windows — The USER\[dsn_name]\postcreate specifies a string. Next, the DAC checks that SYSTEM\[dsn_name]\postcreate specifies a string.

UNIX — This setting is not supported.

Table type filter

Limits the list of tables to several preset types. The default is 'TABLE','VIEW','ALIAS','SYNONYM'. If you set this value to * (asterisk) the list will not be filtered.

The DAC checks the following values and locations, based on your operating system:

Windows — The USER\[dsn_name]\tablelistfilter specifies a comma delimited string that list single quoted values that indicate table types. Next, the DAC checks whether SYSTEM\[dsn_name]\tablelistfilter specifies a comma delimited string.

UNIX — This setting is not supported.

Page 172: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

158 DataFlux Integration Server User's Guide

Setting Description

TKTS DSN directory

Specifies the path where TKTS DSNs are stored in XML files.

$DFTKDSN may specify the path to the TKTS DSN directory. If it does not specify the value, the DAC checks the following values and locations, based on your operating system:

Windows — The $DFEXEC_HOME\etc\dftkdsn\

UNIX — The

directory.

$DFEXEC_HOME/etc/dftkdsn\

TK Path

directory.

Specifies where TK files are located. The dftksrv path and core directory should be specified.

$DFTKPATH may specify the TK path. If it does not, the DAC checks the following values and locations, based on your operating system:

Windows:

1. The USER\tkpath value.

2. The SYSTEM\tkpath value.

3. The $DFEXEC_HOME\bin;$DFEXEC_HOME\bin\core\sasext

UNIX — Check $TKPATH. Next, check

location.

$DFEXEC_HOME/lib/tkts

DFTK log file

.

Specifies the log file that interactions with the DFTKSRV layer and is only useful for debugging issues specific to dftksrv.

The DAC checks the following values and locations, based on your operating system:

Windows:

1. The USER\dftklogfile value.

2. The SYSTEM\dftklogfile value.

3. The $DFTKLOGFILE value.

UNIX — The $DFTKLOGFILE value.

Page 173: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 159

Setting Description

TKTS log file Specifies the log file that is produced by the TKTS layer and is useful for debugging tkts issues.

The DAC checks the following values and locations, based on your operating system:

Windows:

1. The USER\tktslogfile configuration value.

2. The SYSTEM\tktslogfile value.

3. The $TKTSLOGFILE value.

UNIX — The $TKTSLOGFILE value.

Disable CEDA Specifies whether to disable CEDA. This setting is only applicable to tkts connections.

The DAC checks the following values and locations, based on your operating system:

Windows:

1. The USER/dftkdisableceda configuration value, which should specify any non-null value, for example, yes.

2. The SYSTEM \dftkdisableceda value.

3. The $DFTKDISABLECEDA value.

UNIX — The $DFTKDISABLECEDA value.

TKTS startup sleep

Specifies how much time, in seconds, to delay between the start of the dfktsrv program and the booting of TK.

The DAC checks the following values and locations, based on your operating system:

Windows — The USER\tktssleep configuration value. Next, the DAC checks the SYSTEM\tktssleep value.

UNIX — This setting is not supported.

Page 174: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

160 DataFlux Integration Server User's Guide

Setting Description

Command file execution

Specifies a text file with SQL commands (one per line). These commands will run in turn, on any new connection that is made. For example, they can be used to set session settings. This is only implemented for the ODBC driver.

The USER\savedconnectiondir configuration value may specify the path to the saved connections. The DAC checks for files with the same filename as the DSN and a .sql extension.

Note: Environment variables are specified as $[variable_name]. Typically, DIS will set environment variables to appropriate locations. For example, $DFEXEC_HOME is set to the DIS home directory.

Page 175: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

DataFlux Integration Server User's Guide 161

Glossary A

ACE

An access control entry (ACE) is an item in an access control list used to administer object and user privileges such as read, write, and execute.

ACL

Access control lists (ACLs) are used to secure access to individual DIS objects.

API

An application programming interface (API) is a set of routines, data structures, object classes and/or protocols provided by libraries and/or operating system services in order to support the building of applications.

D

DAC

A data access component (DAC) allows software to communicate with databases and manipulate data.

DPV

Delivery Point Validation (DPV) is a USPS database that checks the validity of residential and commercial addresses.

DSN

A data source name (DSN) contains connection information, such as user name and password, to connect through a database through an ODBC driver.

L

LACS

Locatable Address Conversion System (LACS) is used updated mailing addresses when a street is renamed or the address is updated for 911, usually by changing a rural route format to an urban/city format.

M

MMC

The Microsoft Management Console (MMC) is an interface new to the Microsoft Windows 2000 platform which combines several administrative tools into one configurable interface.

O

ODBC

Open Database Connectivity (ODBC) is an open standard application programming interface (API) for accessing databases.

Page 176: DataFlux Integration Server - SAS · Licensed under the Apache License, Versi on 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain

162 DataFlux Integration Server User's Guide

Q

QAS

QuickAddress Software (QAS) is used to verify and standardize US addresses at the point of entry. Verification is based on the latest USPS address data file.

QKB

The Quality Knowledge Base (QKB) is a collection of files and configuration settings that contain all DataFlux data management algorithms. The QKB is directly editable using dfPower Studio.

R

RDI

Residential Delivery Indicator (RDI) identifies addresses as residential or commercial.

S

SERP

The Software Evaluation and Recognition Program (SERP) is a program the Canadian Post administers to certify address verification software.

SOA

Service Oriented Architecture (SOA) enables systems to communicate with the master customer reference database to request or update information.

SOAP

Simple Object Access Protocol (SOAP) is a Web service protocol used to encode requests and responses to be sent over a network. This XML-based protocol is platform independent and can be used with a variety of internet protocols.

U

USPS

The United States Postal Service (USPS) provides postal services in the United States. The USPS offers address verification and standardization tools.