12
Copyright © Vivísimo, Inc. All rights reserved worldwide 1 Documentum Connector Field Guide Background, Setup, Troubleshooting & Debugging

Documentum Connector Field Guide

Embed Size (px)

Citation preview

Page 1: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 1

Documentum Connector Field Guide Background, Setup, Troubleshooting & Debugging

Page 2: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 2

Documentum Background

Design The Velocity Documentum connector allows you to crawl Documentum Docbases. One document is created for the meta-data and one for the contents for each Documentum document. These documents are then merged back together to create a single document. Documentum requires two collections, one to crawl the actual documents in a table, the other to crawl the users table. A form is then added to the document collection's source to pass the user's rights to the search engine.

Prerequisites

EMC Documentum Content Server governs the EMC Documentum content repository. Content

Server provides administrators set of content management services and a comprehensive

infrastructure to manage all content applications.

In order to crawl a Docbase, the Documentum Foundation Classes (DFC) must be installed on the same machine as Vivísimo Velocity, and must be accessible by whatever web service is running. The path to this DFC set must be specified as the Documentum Install Dir. If the Documentum server is on the same machine as Velocity, the Documentum Foundation Classes should already be installed; otherwise, you may have to install them. See the Documentum documentation for more information. (Your customer should be able to obtain the DFC through their customer account on EMC's web site.)

You must ensure that you install the version of the DFC that is intended to be used with the

operating system that Vivísimo Velocity is running on. The Windows, Linux, and Unix versions of

the DFC are not interchangeable. Currently, only a 32-bit version of the DFC is available.

What is DFC? EMC Documentum Foundation Classes is a unified application programming

interface (API) that applications can call on to leverage any content service provided by the EMC

Documentum platform. Documentum Foundation Classes comprises a Documentum-specific API

called Documentum Foundation Classes (DFC) and a set of standards-based APIs including

WebDAV, SMB, FTP, ADO.NET, ODBC, JDBC, and ECI. Developers can write any type of

content-rich application including web or portal applications, or custom user interfaces for the

desktop. Vivisimo currently utilizes the JDBC API interface and will be evaluating the WebDaV

protocol.

Interfaces EMC Documentum provides multiple end user interfaces including integrated applications. Vivisimo’s customers primarily use one of the following:

• Webtop is the primary interface that provides access to the EMC Documentum repository and content management services within a standard browser application (Figure 1)

• Documentum Portlets (Figure 2) • Lotus Notes • Microsoft Outlook (DCO)

Note: In July of 2008 EMC announced Documentum Enterprise Content Management (ECM) Suite version 6.5—a family

of products that marries the user experience of Web 2.0 with enhanced XML capabilities. This new release has introduced

repository architecture updates as well as end user interface a new lightweight client that is completely integrated with the

desktop. Content Management Interoperability Services (CMIS) is a new set of standards and web services that ensure

interoperability among disparate content repositories. EMC, IBM, and Microsoft have jointly drafted Content Management

Interoperability Services (CMIS) specification and have submitted it to the Organization for the Advancement of Structured

Page 3: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 3

Information Standards (OASIS), in an effort to allow unprecedented interoperability with and between disparate, multi-

vendor ECM solutions.

Documentum Webtop

Figure 1

Documentum Portlets

Figure 2

Page 4: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 4

Setup

Prerequisites As stated in the Velocity documentation, the Documentum Foundation Classes (DFC) must be installed on the same machine as Vivísimo Velocity, and must be accessible by whatever web service is running. The path to this DFC set must be specified as the ‘Documentum Install Dir’ seed element. If the Documentum server is on the same machine as Velocity, the Documentum Foundation Classes should already be installed; otherwise, you will have to install them.

You must ensure that you install the version of the DFC that is intended to be used with the

operating system that Vivísimo Velocity is running on. The Windows, Linux, and UNIX versions of

the DFC are not interchangeable. Currently, only a 32-bit version of the DFC is available. If we

are crawling from a 64-bit Velocity environment, we must modify the DFC, jre and lib environment

and we will address that later. EMC Documentum Content Server and Documentum Foundation

Classes (DFC) are available from the official EMC web site and requires the customer’s

registered logon to access: https://emc.subscribenet.com/control/dctm-eval/login

The current versions of the DFC are also available on the Office shared drive:

Y:\connectors\Software\Documentum or

/office/connectors/Software/Documentum

DFC6

DFC6.5

DFC6.5-SP1

Note: Also, there are still many EMC customers that still have Documentum 5.3 and situations where they are in the

process of migrating from 5.x to 6.x and have both versions supported and maintained. Documentum has supported this

environment with features within the 6.x product including Docobject and metadata schemes to ease the migration provide

backward compatibility.

DFC Install If you need to install the DFC, create a directory named DFC that is accessible by Velocity through your Web server. You may want to add a separate directory to the web server config or just add the DFC directory under the installed Velocity web directory. Extract the contents of the DFC archive into this directory. All files must be readable by the Velocity application. You can then set the Documentum Install Dir in the seed to <install dir>/DFC, and set the Shared Directory in the seed to <install Dir>/DFC/dfc. You will then need to modify two property files, config/dfc.properties and config/log4j.properties to identify the location where you have installed the DFC.

Multiple Version Documentum environments In the field you may find customers that are currently maintaining multiple versions of Documentum (i.e. version 5.3 and 6.0) and though they are on different host servers, Velocity may be required to crawl and index docbases from both versions. DFC installer will only allow one version to actually be installed at a time and since we must install on our Velocity server, we must install on a separate server and copy the install directory to our Velocity server. Then we must set the appropriate OS environment variables.

Page 5: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 5

To set the OS specific variables:

• Windows: Installation program sets the environment variables. o The DFC installation program for Windows sets environment variables. The only

additional setting you need to make is to add jars to the classpath if you need to refer to DFC classes and interfaces in your Java programs.

o On Windows systems, the installation program uses the shared subdirectory of the program root directory. It attaches the full path of this directory (followed by a separator character) in front of the value of the PATH system environment variable.

o On Windows systems, the installation program asks you for the information that it uses to set these variables. See Table 1 below.

• UNIX/Linux: You set environment variables.

o For UNIX systems, the installation program does not set environment variables. If the installation program does not find the needed environment variables, it aborts the installation. The way to set environment variables depends on the shell that you use. Be sure to set the variables in such a way that a process launched in a different shell has the same values defined. This means using setenv or export (depending on the shell). Do not use set, which defines variables only for the current shell, but not for any child shell. In order to run more than one version of DFC on a UNIX system, you must arrange to run the different DFC versions in different processes. You must install the different versions of DFC in locations that you can distinguish from one another by setting the environment variables.

o On UNIX systems the installation program uses the dfc subdirectory of the program root directory. You must place the full path of this directory onto the library path. The library path environment variable has different names in different versions of UNIX.

o On UNIX systems, you must set these variables before you run the installation program. Table 1, below lists these environment variables and summarizes the ways that DFC uses them. Environment variables can be set on UNIX systems using the setenv script. The script can be found at $<install Dir>/dfc/set_dctm_env.sh (.csh). You can source this file to properly set the environment variables from table below.

Variable How DFC uses it Windows value

(installation program sets)

UNIX value (you set)

DOCUMENTUM_ SHARED

Determine the full path to the program root directory for UNIX

Not used by Windows systems

Specify a value before installing DFC

PATH Find the directory containing DFC shared libraries (DLLs) on Windows

Attach the full path (followed by a separator character) in front of the shared

subdirectory of the Documentum program root

Not used by UNIX systems

Library path (the appropriate installation guide lists the different names for this variable on

Find the directory containing DFC shared libraries on UNIX

Not used by Windows systems

Add $DOCUMENTUM_ SHARED/dfc

Page 6: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 6

different UNIX systems) DFC_DATA Documentum has

deprecated this variable.

Directory for DFC configuration, the appropriate installation guide provides information about what you should do instead of using this variable.

DOCUMENTUM Determine the full path to the user root directory

Not used by Windows systems

Specify a value before installing DFC

CLASSPATH Allow Java runtime to find dctm.jar and, the DFC config directory. See the appropriate installation guide for information about making DFC classes available to the javac compiler

Attach (with appropriate separator characters) the full paths of dctm.jar and the config directory (for example, C:\Program Files\Documentum\ Shared\dctm. jar and C:\Documentum\ config)

Add $DOCUMENTUM_ SHARED/dctm. jar and $DOCUMENTUM_ SHARED/config

Table 1 Environment Variables that DFC Uses

DFC Install on Linux

There are some known issues when installing the current 6.0 DFC on Linux. The following steps

have been documented on the Documentum Connector wiki page and may change in the near

future.

• Choose an installation directory (Warning: not a NFS mount!) • Add the following environment variables pointing to the install directory

o export DOCUMENTUM_SHARED=/opt/DFC o export DOCUMENTUM=/opt/DFC

• untar the DFC file in $DOCUMENTUM • Run the installer and set the following configuration

o connection broker: <IP of the Documentum Content Server> o port: 1489 (default) o username: dm_bof_registry (default) o password: <password>

• The last installer screen should specify the installation was successful • Read the install log • If there is an error: “Publication of DFC instance with global registry failed” add the

following line to the dfc.properties file: o dfc.bof.registry.repository=<docbase name> o dfc.bof.registry.repository.username=dm_bof_registry (default) o dfc.bof.registry.repository.password=<password> (encrypted version)

• You should find a log file, log4j.log and if there is an error: o "IO Exception attempting to acquire interprocess

lock.../opt/DFC/config/dbor.properties.lck [...] FileNotFoundException .../opt/DFC/cache/[...]/content.lck"

o add read and write permissions to dbor.properties.lck and to content.lck for everybody do "chmod -R 777 cache"

• Run the installer again • Now the install.log and log4j.log should no longer report Exceptions and DFC should

function properly.

Page 7: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 7

DFC Troubleshooting

There are some straight forward troubleshooting if you now crawl your Documentum repository

and get the following error:

• Could not get object: [DFC_BOF_CLASS_CACHE_INIT_ERROR] Failed to initialize class cache

• If you see this you will probably also see the following error in the connector logging file ‘log4j.log’ in the DFC/logs directory (also see Documentum Connector Debugging below):

o com.documentum.fc.common.DfNewInterprocessLockImpl - IO Exception attempting to acquire interprocess lockjava.io.FileNotFoundException: C:\Documentum\cache\6.5.0.038\bof\inpex_dctm\content.lck (Access is denied)

• Add write permissions to content.lck to solve this problem. The installation program maintains an error log, which it writes to a file called setupError.log in the current working directory. If it cannot write into the working directory, it writes to the home directory of the user who initiated the installation. Reading this file may help you see what went wrong.

Documentum Connector on a 64-bit Server

Currently, only a 32-bit version of the DFC is available. If we are crawling from a 64-bit Velocity environment, we currently have two options:

1. We must modify the DFC, jre and lib environment by copying these directories from a 32-bit installation to the 64-bit installation.

a. Crawling from 64-bit Linux/UNIX: i. Copy the INSTALL_DIR/jre and INSTALL_DIR/lib/libmisc.so

b. Crawling from 64-bit Windows: i. Copy the INSTALL_DIR/jre and INSTALL_DIR/lib/misc.dll

2. Install a 32-bit instance of Velocity on the Documentum host server and maintain the collection and source from that instance. Now you can create a source on the 64-bit Velocity server and point to the 32-bit Velocity source.

Page 8: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 8

Configuring Documentum Seeds (from the online documentation) Crawling Documentum requires two collections, one to crawl the documents, the other to crawl the users table. The Documentum seed is used to crawl documents within a docbase and consists of the following fields:

• Host - Host to connect to. • Port - Port on which Documentum is running. • Username - Username used to connect to the Documentum server. • Password - Password used to connect to the Documentum server. • Docbase - Docbase from which to retrieve documents. The name of the Docbase is

case-sensitive. • DQL Statement (optional) - The DQL statement used to query the Documentum

Docbase. When doing a partial refresh the last crawl time must get passed in the DQL statement. To accomplish this, the DQL statement must be edited in xml mode which can be done by clicking the [xml] link. Once in xml mode, the two variables, date-time and new-date, must be declared and set. After those two variables are set the condition r_modify_date > date('<value-of select="$new-date" />') must be added to the where clause. The example below enables partial refreshing for the default DQL query:

<declare name="date-time" /> <declare name="new-date" /> <process-xsl> <![CDATA[ <xsl:template match="/"> <set-var name="date-time"> <value-of select="viv:seconds-to-local-date-time($live-crawl-date)" /> </set-var> </xsl:template> ]]> </process-xsl> <set-var name="new-date"><value-of select="date:month-in-year($date-time)" />-<value-of select="date:day-in-month($date-time)" />-<value-of select="date:year($date-time)" /> <value-of select="date:time($date-time)" /></set-var>

select r_object_id, r_modify_date from dm_document <if test="$live-crawl-date > 0"> where r_modify_date > date('<value-of select="$new-date" />') </if>

o Additional custom Documentum fields may be added to the DQL and contents nodes will be created. Use the Documentum converter to map to specific content by modifying the XPath:

� viv:choose(@name = 'title', 'title', @name = 'subject', 'description', @name = 'r_modified_date', 'last-modified', @name = 'r_modifier', 'author')

• All Versions (optional) - Crawl all versions of a document. By default just the current version is crawled.

• Virtual Documents (optional) - Crawl documents as virtual documents. • Documentum Version (optional) - The Documentum version to crawl. • Documentum Install Dir - The Documentum installation directory. This directory should

contain both the Documentum Shared and config directories. • Shared Directory (optional) - Location of the Documentum Shared directory. If no path is

specified, the Shared directory is assumed to be in Documentum Install Dir/Shared. • URL Root - Root URL of the Documentum web interface. r_object_id will be appended to

the URL provided. • Group/User Prefix (optional) - Prefix added to groups and users to make their names

unique.

Page 9: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 9

Documentum Seed

Figure 3

The Documentum User seed is used to crawl user rights within a Documentum server and consists of the following fields:

• Host - Host to connect to. • Port - Port on which Documentum is running. • Username - Username used to connect to the Documentum server. • Password - Password used to connect to the Documentum server. • Docbase - Docbase from which to retrieve documents. The name of the Docbase is

case-sensitive. • All Versions (optional) - Crawl all versions of a document. By default just the current

version is crawled. • Documentum Install Dir - The Documentum installation directory. This directory should

contain both the Documentum Shared and config directories. • Shared Directory (optional) - Location of the Documentum Shared directory. If no path is

specified, the Shared directory is assumed to be in Documentum Install Dir/Shared. • Group/User Prefix (optional) - Prefix added to groups and users to make their names

unique.

Page 10: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 10

In the Search Configuration for the Docbase collection (Search Tab) you must set the ‘Rights Required’ to true.

In the Live Source of the of the docbase collection add the form component ‘Documentum Rights’ with the following info:

• Documentum Users Collection- This is the name of the user collection that was created.

• User OS Name - For testing purpose you can pass a known user username to return specific documents that that user is known to have rights to access.

• User OS Domain - Optional

Restricting Documentum Crawls The Documentum seed used to crawl your Docbase allows you to change the DQL Statement to restrict the crawl. Examples:

• Using the following DQL query in the seed to return documents from specific authors o SELECT r_object_id, r_modify_date from dm_document where ANY authors =

'<author-name>' and ANY authors='<author-name>' • Crawl a specific cabinet or folder and recourse through all sub-folders:

o SELECT r_object_id, r_modify_date from dm_document WHERE FOLDER ('/<Cabinet name>',DESCEND)

• Get all files (and versions) under a particular cabinet: o SELECT r_object_id, object_name from dm_document(all) where

folder(’/<Cabinet name>’, DESCEND) • Get only current versions in a cabinet:

o SELECT * from dm_document where folder(’/<Cabinet name>’, DESCEND) • DQL to find whether a document is a part of virtual document

o SELECT object_name,r_object_id FROM dm_sysobject WHERE r_object_id IN (SELECT parent_id FROM dmr_containment WHERE component_id = (SELECT i_chronicle_id FROM dm_sysobject WHERE r_object_id = ‘<child-object-id>’))

The following DQL can be used to debug content issues: • DQL to find object type of a document

o SELECT r_object_type from dm_document where object_name=’ObjectName’ • DQL to list objects having duplicate names

o SELECT object_name, count(*) FROM dm_document GROUP BY object_name HAVING count(*) > 1 ORDER BY object_name

• DQL to get total number of documents and folders under a cabinet o SELECT count(*) as cnt, ‘Docs’ as category FROM dm_document(all)

WHERE FOLDER (’/Cabinet Name’,DESCEND) UNION SELECT count(*) as cnt, ‘Folders’ as category FROM dm_folder WHERE FOLDER (’/Cabinet Name’,DESCEND)

• DQL to retrieve all required attributes of a particular type

Page 11: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 11

o SELECT attr_name FROM dmi_dd_attr_info WHERE type_name=’dm_document’ AND is_required <> 0

• DQL to limit the number of documents to return o SELECT object_name FROM dm_document ENABLE (RETURN_TOP 10)

• DQL to find the file system path location of a document o SELECT doc.r_object_id, doc.object_name, MFILE_URL(”,-1,”) as

mypath,doc.i_folder_id from dm_document doc where <condition>

Documentum Connector Debugging Connector Logging should be used to determine issues with the connector errors and results. Here are the instructions for adding connector logging:

1. Open the collection click Configuration -> Crawling tab 2. Scrolling downward you should see a button called "Add a new condition" 3. Click and add "Connector Logging" 4. In the Log4j configuration box copy and paste the default configuration and then modify

the ‘priority value =”debug” by entering the following and hit OK: a. <category name="com.vivisimo.connector"> <priority value="debug" />

</category> <category name="httpclient.wire"> <priority value="error" /> </category> <category name="com.interwoven"> <priority value="info" /> </category> <root> <priority value="error" /> <appender-ref ref="FILE" /> </root>

5. Start the crawl, or, do a "test it" and then find the connector log file that should be in: ($VIV_INSTALL/tmp/viv_connector-{COLLECTION_NAME}-{PROTOCOL}-log)

The Documentum User seed is used to crawl user rights within a Documentum server and has some compatibility issues between Documentum 5.x and 6.x. With Documentum 6.x a user is uniquely identified either by its 5.3 fields ("user_os_name" and “user_os_domain”), or by the new 6.0 fields ("user_login_name" and “user_login_domain”). Prior to 6.x, the only field was “user_os_name” and it has been maintained for backward compatibility. The Documentum User connector should identify this modification after Velocity 7.03 and check which fields have the values and authenticates with the proper user credentials.

If your source/collection is not returning results, you can test a specific Documentum user that you know has access to documents and field data. To verify the user:

• For testing purpose you can pass a known user username to return specific documents that that user is known to have rights to access.

• Search the Documentum Users Collection and verify that the specific user is actually returned in the results and that the user_login_name is the proper scheme and case.

• In the form component of your docbase collection source, ‘Documentum Rights’, enter the literal string for that user or if logged in as that user force the session value by editing the ‘User OS Name’ and in xml mode enter:

o <value-of select="$user.name" />

Documentum Query Tools

Samson is a desktop application that comes packaged with the Content Server installation. It can be found in $DOCUMENTUM/unsupported/win32 folder. It comes with a small instruction document as well.

Page 12: Documentum Connector Field Guide

Copyright © Vivísimo, Inc. All rights reserved worldwide 12

Delilah is a client application for Documentum, written by Rob de Leeuw. It can be seen as an alternative to Documentum’s Desktop Client or the unsupported Samson tool. Delilah is recognized by many Documentum Power Users and Administrators for it's performance, search and navigation features and also for the easy way of sending query results to Excel. Using MDI technology, you can open as many windows (e.g. Explorer, DQL, API and Search windows) within Delilah as you need. Delilah is a "light weight" fat client, it is just a few MB in size. http://canservices.nl/cms/ You will need to register to download.