135
EMC ® InfoArchive Version 4.1 Configuration & Administration User Guide EMC Corporation Corporate Headquarters Hopkinton, MA 01748-9103 1-508-435-1000 www.EMC.com

EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

  • Upload
    dangtu

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

EMC® InfoArchiveVersion 4.1

Configuration & Administration User Guide

EMC CorporationCorporate Headquarters

Hopkinton, MA 01748-91031-508-435-1000www.EMC.com

Page 2: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Legal Notice

Copyright © 2016 EMC Corporation. All Rights Reserved.

EMC believes the information in this publication is accurate as of its publication date. The information is subject to changewithout notice.

THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATIONMAKES NO REPRESENTATIONSOR WARRANTIES OF ANY KINDWITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLYDISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.

For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Adobe and Adobe PDFLibrary are trademarks or registered trademarks of Adobe Systems Inc. in the U.S. and other countries. All other trademarksused herein are the property of their respective owners.

Documentation Feedback

Your opinion matters. We want to hear from you regarding our product documentation. If you have feedbackabout how we can make our documentation better or easier to use, please send us your feedback directly [email protected]

Page 3: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Table of Contents

Revision History ................................................................................................................... 7

Chapter 1 Overview ...................................................................................................... 9Backup Versus Archive ..................................................................................... 10Modes of Archiving: When to Use Table Archiving or SIP Archiving ................... 10Step 1: Setting Archiving Goals ..................................................................... 10Step 2: Understanding the Source Application................................................ 12Step 3: Selecting the Appropriate Archiving Method ...................................... 12Table Archiving ........................................................................................ 13Data Archiving......................................................................................... 13File Archiving .......................................................................................... 14Compound-Record Archiving ................................................................... 14

A Solution for Structured and Unstructured Content ...................................... 15Submission Information Packages.................................................................. 15InfoArchive Information Model..................................................................... 15Generating SIPs ............................................................................................ 16

Use Cases......................................................................................................... 16Cost Take-Out: Application Decommissioning ............................................... 16Optimize: Active Application Archiving ........................................................ 18Information Transformation and Reuse .......................................................... 18

User Roles........................................................................................................ 18Inline Users.................................................................................................. 19Troubleshooting ........................................................................................... 20

Branding Customization ................................................................................... 20Configuration to Add Sample Branding Customization................................... 20Configuration to Add New Branding Customization ...................................... 21Setting the Customization Location when Deploying to Tomcat orOther Containers .......................................................................................... 21Verifying and Viewing the Branding Customization ....................................... 22Troubleshooting ........................................................................................... 22

Chapter 2 Connectors .................................................................................................. 23Application and Platform Examples................................................................... 24

Chapter 3 Architecture ................................................................................................. 25Ingestion.......................................................................................................... 26What is ETL? ................................................................................................ 27How to Extract Data ..................................................................................... 27

Storage ............................................................................................................ 28How Data is Searched ....................................................................................... 29Forms .......................................................................................................... 30Search Results .............................................................................................. 30

Chapter 4 Core Configuration ...................................................................................... 33System and Audit Database .............................................................................. 33

3

Page 4: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Table of Contents

xDB Federations and xDB Databases.................................................................. 34File System Root ............................................................................................... 35Space Root xDB Library and Space Root Folder .................................................. 35Language Support ............................................................................................ 35Adding New Language Support .................................................................... 36

Backing Up and Restoring of the Managed Item Database ................................... 37Configuring the Managed Items Database ...................................................... 37Backing Up the Managed Items Database....................................................... 38Setting the Managed Item Store ................................................................. 38

XdbLibrary .................................................................................................. 38Restoring the Managed Database................................................................... 39Using the xDbLibrary ............................................................................... 39Selected Managed Items........................................................................ 41

Using the Managed Items.............................................................................. 41

Chapter 5 Creating a Search ........................................................................................ 45Creating a Search for a SIP Archive Application ................................................. 45Updating a Search Template Status to Ready ...................................................... 47Creating a Search for a Table Archive Application .............................................. 48Editing a Search ............................................................................................... 49Deleting a Search ............................................................................................. 49Configuring Search Form Fields ........................................................................ 50Configuring a Result Column ............................................................................ 52Creating a Duplicate Search .............................................................................. 53Table-Based Search – XForms, XQueries and Query Results ................................ 53Example: First Name and Last Name with External Variables.......................... 54Incomplete Forms......................................................................................... 54Nested Searches ........................................................................................... 54Authorization............................................................................................... 56Multiple XQueries/XForms per Search ....................................................... 56

Using ANT Tasks to View, Create, Delete or Update XQuery Modules ................. 56Background Searches ........................................................................................ 57Order........................................................................................................... 58Using the Background Results Tab................................................................. 59Deleting a Background Task ...................................................................... 60

Exporting Search Results................................................................................... 60Configuring Export Functionality in the InfoArchive Web Application................. 61Composition (Edit) Mode.............................................................................. 61

Installing the Export Functionality ..................................................................... 63Backward Compatibility ................................................................................... 63Adding the Ability to Export Search Results ....................................................... 63Configuring Export Objects ............................................................................... 64Configuring Application-Specific Export Objects using ANT Scripts ............... 64Configuring EP, EC and ET Objects using IAShell ........................................... 64Configuration Export Through ANT .............................................................. 67

Chapter 6 Compliance ................................................................................................. 69Terminology..................................................................................................... 69Retention Policies ............................................................................................. 70Retention Policy Types .................................................................................. 71

4

Page 5: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Table of Contents

Determining the Best Method to Apply Retention............................................... 72Applying Multiple Retention Policies............................................................. 72Record-Based Retention ................................................................................ 73

Hold Management............................................................................................ 73The Purge Process ............................................................................................ 73What Happens if a Purge Candidate List is Rejected?...................................... 74Disposition of a SIP-Based Application .......................................................... 74Disposition of a Table-Based Application ....................................................... 75

Retention Rules on the Disposition of AIPs, AIUs, Applications andRows ............................................................................................................... 75Table Archiving – Application ....................................................................... 75SIP Archiving ............................................................................................... 77

Using the Retention Sets Tab ............................................................................. 78Viewing Items in a Retained Set..................................................................... 79

Using the Hold Sets Tab .................................................................................... 79Viewing Items in a Hold Set .......................................................................... 80Removing an Item from a Hold Set ................................................................ 80Troubleshooting ........................................................................................... 80

Using the Purge Lists Tab.................................................................................. 81Performing Actions to a Purge List ................................................................ 82

Using the Application Info Tab .......................................................................... 82Applying a Retention Policy to an Application ............................................... 82Removing Retention from an Application ...................................................... 83Applying a Hold to an Application ................................................................ 83Removing a Hold Set from an Application ..................................................... 83

Using the Packages Tab ..................................................................................... 84Retention and ECS Storage ............................................................................ 85Applying Actions to a Package ...................................................................... 85Applying a Retention Policy to an AIP ........................................................... 87Applying a Hold to an AIP............................................................................ 87Rejecting or Invalidating an AIP ................................................................... 88

Using the Retention Policies Tab ........................................................................ 88Creating a Retention Policy ........................................................................... 89Editing a Retention Policy ............................................................................. 90Deleting a Retention Policy ........................................................................... 91Event-Based Retention .................................................................................. 91

Using the Holds Tab ......................................................................................... 92Creating a Hold............................................................................................ 92Editing a Hold.............................................................................................. 93Deleting a Hold ............................................................................................ 93

Chapter 7 Authentication and Authorization ................................................................ 95Authentication ................................................................................................. 95Active Directory Integration .......................................................................... 95User Roles and User Groups...................................................................... 95

Authorization .................................................................................................. 96Using the Groups Tab ....................................................................................... 96Using the Permissions Tab................................................................................. 97

Chapter 8 Administration ............................................................................................. 99Using the Application Page .............................................................................. 99Creating an Application ................................................................................ 99Editing an Application ............................................................................ 100

5

Page 6: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Table of Contents

Deleting an Application .......................................................................... 101Deleting Data from an Application .............................................................. 101

Working with Jobs: List of Available Jobs ......................................................... 101Using the Apply Retention Policy to Records Job .......................................... 104Using the Archive Audits Job ...................................................................... 105Using the Clean Up Purge Candidate List Job............................................... 106Using the Requalification Job....................................................................... 106Using the Refresh Metrics Job...................................................................... 106Using the Remove Policy Job ....................................................................... 107Using the Trigger Event Policy Job ............................................................... 107Populating Event Dates for the Trigger Event Policy Job ............................ 108Close ................................................................................................. 108Clean ................................................................................................ 109

Table Data Volume Update...................................................................... 109Using the Jobs Tab .......................................................................................... 110Viewing a Job’s Run History ........................................................................ 111Creating a Job............................................................................................. 111Editing a Job............................................................................................... 113Running a Job............................................................................................. 115Suspending a Job ....................................................................................... 115

Using the Storage Tab ..................................................................................... 115Federations ................................................................................................ 115Registering a Federation ............................................................................. 116Databases................................................................................................... 116Creating a Database ................................................................................ 116Adding a Storage System ........................................................................ 117

Using the Spaces Tab ...................................................................................... 118Creating a Space ......................................................................................... 118

Using the Stores Tab ....................................................................................... 119Editing Stores Configuration ...................................................................... 119

Configuring ECS Storage................................................................................. 120Installing Centera SDK on Linux ..................................................................... 120Installing Centera SDK on Windows ................................................................ 121Configuring Centera Storage ........................................................................... 122Using the Audit Tab........................................................................................ 122Audit Application ...................................................................................... 123

Load Balancing Testing of InfoArchive Servers ................................................. 124Load Balancing with Apache ....................................................................... 124Parallel SIP ingestion .................................................................................. 125Parallel Table Ingestion ............................................................................... 126

Load Balancing Testing of the InfoArchive Web Application.............................. 127HTTP with Sticky Sessions .......................................................................... 127How to Run Gateway/InfoArchive Web Application ..................................... 127

InfoArchive Server and Gateway/InfoArchive Web ApplicationCommunication Setup .................................................................................... 128Self-Signed Certificates-Based Setup ............................................................ 129

6

Page 7: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Revision History

The following changes have been made to this document.

Revision Date Description

October 2016 The following sections were added:

• Installing Centera SDK on Linux

• Installing Centera SDK on Windows

• Load Balancing Testing of InfoArchiveServers

September 2016 4.1 Release.

June 2016 InfoArchive version 4.0 initial publication

7

Page 8: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Revision History

8

Page 9: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Chapter 1Overview

InfoArchive is an integrated product suite designed for managing and archiving applicationinformation.

It is important to understand fundamental InfoArchive concepts before performing configurativeand administrative tasks. This chapter grounds you in the essentials of InfoArchive terminology andconcepts that underlie the complete data archival and retrieval processes.

InfoArchive is designed using a unified OAIS-compliant data model that dictates the formatinformation is ingested into, stored and archived in, and retrieved from InfoArchive throughoutits lifecycle.

InfoArchive allows the customer to:

• Ingest and store data,

• Protect ingested data,

• Execute queries against ingested data, and

• Dispose of obsolete data in a controlled fashion.

InfoArchive allows the ingestion of data from:

• Live systems that currently generate data. The customers plans to continue supporting andmaintain the system until the end of its active service.

• Retired systems that no longer generate data. The customer has been maintaining and supportingthe system for compliance reasons, for instance. InfoArchive allows the customer to load theretired application’s data into a cheaper system, thereby decreasing the total cost of ownership.The customer can then execute any required queries against the ingested data.

Customers are able to ingest tabled-based data as well as OAIS-based data, which is also referred toas SIP archiving.

The customer uses a command line interface (CLI) to ingest table and SIP data. Structured andunstructured data can be ingested into InfoArchive.

9

Page 10: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

Backup Versus ArchiveBackup Archive

Protection – Frequent snapshots of data toprotect against data loss because of systemfailure

Preservation – Movement of data to a low-costplatform to ensure compliance and reduce costs

Copies data for protection Moves data off production disk

Supports operations and recovery Supports business and compliance

Supports availability Supports operational efficiencies

Point-in-time only Comprehensive in nature

Poor solution for regulatory compliance Ideal solution for regulatory compliance

Not easily searched Easily searched

Often, old backups cannot be restored Provides historic reference

Modes of Archiving: When to Use TableArchiving or SIP ArchivingIt is important to determine the best method to archive structured and unstructured data.

InfoArchive is an application-agnostic solution for information management and archiving thatsupports different enterprise needs for ingesting different types of application data. It can helpreduce the cost of managing application assets, improve information governance, and add value tobusiness processes through the reuse of information. Four methods of ingestion are provided to meetthe needs of competing project requirements and optimize the environment to the source application.With InfoArchive, there is no need to take a one-size-fits-all approach.

This section provides a step-by step guide to help you choose which of the four InfoArchive methodsfor preserving and reusing information is the right method to use for each of your applications.

Step 1: Setting Archiving Goals

It supports three core use cases based on your short and long-term archiving goals:

• Cost Take-Out. InfoArchive can provide a repository for data from legacy and redundantapplications that might have been superseded by an ERP system, replicated during an acquisition,or must be decommissioned as part of a business sale, closure, or industry mandate. Data fromapplications that a company migrates to InfoArchive will remain accessible for business reporting,audits, or compliance with data-retention regulations. Meanwhile, the organization can shut

10

Page 11: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

down the applications and save all the costs associated with supporting and maintaining theapplications.

• Optimize. Companies can also use InfoArchive for periodic archiving of data and content from“live” business applications to reduce costs for production environments, enable compliant dataretention, and optimize application and infrastructure performance. In addition to savings onstorage costs, companies can reduce costs for backups, system administration, servers, anddatabase licensing costs.

• IT Transformation & Reuse. Increasingly, organizations require access to information for newand innovative uses. Advanced and predictive analytics as well as application modernizationprograms top the list of major programs that demand fresh approaches to information access.InfoArchive supports these requirements by serving as a platform for data aggregation andmanagement that offers access to business records in bulk via the Hadoop Distributed File System(HDFS) or individually to other business applications via web services.

InfoArchive uses extensible mark-up language (XML) as the format for preserving data and metadatafor long-term, platform-independent retention. Data/metadata from multiple sources can beaggregated and represented as XML files. This representation can actually translate to a businessobject.

The technology archives the native XML data and structure, allowing users to query contentefficiently and accurately, at any level of detail. Users can also transform that data into viewsformatted for print, web sites, mobile devices, and other channels. In addition, user interfacedevelopment tools use declarative XML syntax. This greatly reduces the need for and cost of customprogramming to deliver interactive content.

With InfoArchive, companies can manage an unlimited number of different data structures ina single repository. They can store both structured data and unstructured content in a singlerecord and access all the information they need for their business processes and reporting from asingle query. InfoArchive is the only solution that delivers all of the following methods to ingestinformation for archiving:

• Table archiving

• Data record archiving

• File archiving

• Compound record archiving

Organizations can use any of these options to decommission applications and archive activeapplications. Such flexibility is particularly valuable in decommissioning programs that involve manydifferent applications and information formats. Active archiving may use all of the options excepttable archiving. When information aggregation and reuse is key, Data Record, File and CompoundRecord archiving options should be considered.

11

Page 12: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

Step 2: Understanding the Source Application

When choosing the best archiving method for a particular application, a business must considertwo key questions:

• What type of information is being archived?

• How users will access that information going forward?

Most information is managed in one of the following systems:

Transaction systems Transaction systems have databases that hold details of past businessevents, such as those related to processes for accounting, ERP,enterprise asset management, or supply chain management. They areused to maintain reference data in master files, record activities intransaction files, and store old records in transaction-history tables.They may include cloud-based systems and allow many people toadd small bits of detail over time.

Print stream systems Traditionally referred to as COLD (computer output to laserdisk) systems, these systems store print-stream information forlong-term preservation. Most of this information involves customercommunications. Other examples include green-bar reporting systems.

Content and imagerepositories

Content and image repositories store unstructured informationand metadata – typically in their native formats. Examples includethe traditional enterprise content management systems, as well asstorage-based systems.

Interaction systems Interaction systems connect users with an organization for quick accessto complete information. Examples include systems that supportcustomer relationship management (CRM) and collaborative tasks.These systems include data as well as transaction, grouping, andunstructured files.

Collaborative systems Collaborative systems address the needs of groups of individuals toshare information and communicate with each other around specifictopics. These systems have all the characteristics of interaction systemsbut generally cater to a less-structured approach. Notable examplesinclude eRoom, Microsoft SharePoint, and Lotus Notes.

Step 3: Selecting the Appropriate Archiving Method

The four archiving methods offered by InfoArchive are optimized based on the format of data orcontent that is being archived, the ease of extraction and up-front analysis, and how the informationis to be used after it is moved to the InfoArchive repository. Having this choice is a critical successfactor for large-scale information management programs that involve a wide range of applications.

12

Page 13: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

Table Archiving

Table archiving is a method that models structured information in the source application as XML inInfoArchive – table for table, column for column, and row for row. This method lets organizationsquickly decommission applications, providing a fast time to value, while preserving all datarelationships for future queries and reports.

Organizations can use table archiving to migrate structured data in application tables and linked filesfrom transaction systems to InfoArchive with few, if any transformations. Table archiving can reducethe up-front analysis involved with decommissioning an application and virtually eliminate thedata-integrity risks associated with other archiving methods. Due to the fact information is stored inan aggregated manner, access is less flexible than the record-based methods described below. Accessis traditionally limited to query-based (list) reports.

Table archiving is used primarily with transaction systems that contain structured data and linkedfiles as well as with some interaction systems for the purpose of application decommissioning.Information preserved using the table-archiving method may be reprocessed at a later date intorecord formats for reuse scenarios.

Data Archiving

Data archiving involves identifying entities within the target application and extracting andaggregating the associated data into a single XML-based record. Archiving structured data inXML files provides a future-proof format that is ideally suited to long-term retention, access, andcomprehension. Any data structure can be modelled as XML and note that InfoArchive does notimpose its schema. As such, one record may contain multiple XML packets, information frommultiple systems may be drawn together according to the requirements of the project — all whilepreserving a complex multi-system chain-of-custody.

Data archiving is especially useful when companies want to reuse the information, while reducingcosts, in a context that is different from the source application. Additionally, data archiving is wellsuited for active archiving of live systems. It typically requires additional and more business-orientedanalysis of application data than table archiving does. Examples of appropriate data archiving useare SWIFT transactions, sales histories, and patient histories.

There are two important advantages to this method. First, the complex data model of the sourceapplication is transformed into a simple data model in the archive. This can reduce costs and simplifyfuture access. Second, because there is no direct link between the source application and data in thearchive, any change in the source application does not force a change in the archive. When a changein the source application results in an update to the archive data model, InfoArchive ensures thatresults for searches of data sets include all records across all the changed data.

Data archiving also helps organizations create new value for their data. By extending or recordingmetadata, they can harmonize records and support searching and filtering across data sets.

Data archiving is used with transaction systems for active archiving of individual structured datarecords (for example, transaction history tables). It is also used with interaction systems (fordecommissioning data and queries, optimizing searches, and advanced analytics), and with contentsystems. Because it presents data as single records, it is ideal for archiving information according togovernment requirements and legal mandates.

13

Page 14: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

File Archiving

File archiving involves archiving unstructured data and its associated metadata into a single record.The information can be preserved in original format and/or transformed into a more future-proofformat, such as PDF-A. One record may contain multiple files to create sets of related information.

Metadata is particularly important when working with the file-archiving method. Attributes may bederived from the content itself, or associated with other systems.

File archiving is especially useful when organizations want to reuse the information in a context thatis different from the source application. Print streams and media archives are an excellent example offile archiving. Transforming very large print streams (such as customer statements, or explanationsof benefits) from print-oriented formats (such as AFP or metacode) into a PDF for presentation oncompany web platforms can improve ROI and customer satisfaction. Image archives (typicallymulti-page TIF with very limited attributes) can be reprocessed to add document type and evenfull-text OCR (optical character recognition).

File archiving can be used with transaction, content, or reporting systems to decommission content,optimize data search and retrieval, and simplify user access to information across systems. Businessescan use this method to archive any kind of unstructured content files and metadata, including printstreams. It extracts value from archived content by addressing files via the associated applicationmetadata, rather than directly from the infrastructure.

Pushing files into InfoArchive the moment they become “inactive” can reduce costs and increasesystem performance. Having all of an organization’s “inactive” files in the same archiving platform,can also improve data discovery and enhance the options for reusing content and data throughnew applications or web browsers. Please note the information can still be accessed via end-userportals with high availability.

Compound-Record Archiving

As the name implies, compound-record archiving preserves structured and unstructured data into asingle record. The structured elements are modelled as XML, and the unstructured elements may bepreserved in the original format or transformed into a more future-proof format, such as PDF-A. Onerecord will contain multiple files of related information.

Compound-record archiving provides the only mechanism that can compliantly archive systemswith a blend of structured information (such as wikis, blogs, and comments) with unstructuredinformation (primarily attachments). It serves as a proper format for decommissioning as well asactive archiving interactive systems that involve such a mix of structured data and unstructuredcontent. Microsoft SharePoint and Lotus Notes applications are notable examples.

As business processes become more complex and regulations more demanding, there is a growingneed to archive complex business records that may contain multiple structured data and unstructuredcontent elements and must be brought together to create the final business record. Examples includefinancial trades, cases, and laboratory reports.

Organizations can retain business events as single records that they can reuse for analytics orregulatory audits. Users can search the records using pure business logic, without switching fromone application to another.

14

Page 15: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

A Solution for Structured and Unstructured Content

Any ERP implementation involves the creation and management of various records or businessobjects. Some of these records are composed of structured data that users typically enter and accessthrough form fields. As much as 85% of managed information can be unstructured content, accordingto some estimates. This content may include digital images, video, digitally rendered faxes, e-mailmessages, and text documents.

While both structured and unstructured information are usually needed to drive efficient,ERP-enabled business processes, most ERP applications do not have the robust functionalityrequired for handling the indexing, searching, storage, and security of huge volumes of unstructuredinformation in multiple formats. A content management solution, such as Documentum, is oftenneeded to provide such support.

Submission Information Packages

With InfoArchive, information for data record, file, and compound archiving is extracted by anappropriate connector and pushed to the solution in a component known in the terminology of theOpen Archival Information System (OAIS) as a submission information package (SIP). The SIP iscompressed as a .zip file and can be transported using any file transfer technology. The SIP is ingestedto the archive and becomes an archive information package (AIP). Based on the classification, thecontent is stored in an archive holding. When an end user requests data from the archive, the data isdelivered as a dissemination information package (DIP).

SIPs are compressed for transfer between the source application and InfoArchive to reduce networktraffic. Each SIP includes:

1. The SIP Metadata File: A small XML file containing data that describes the SIP and providesdata that InfoArchive uses to set retention dates, access rights, and other archive policies

2. Archive Content Metadata: Another XML file that holds the metadata associated with thecontent to be archived

3. Content File: The unstructured content that is to be archived

When structured data is archived to InfoArchive, the SIP does not contain any content object.The structured data that is to be archived is held as XML in the archive content metadata file andInfoArchive processes the SIP in the same way it would ingest a SIP with unstructured content. Thisstandard process for archiving both types of content enhances efficiency and total cost of ownershipfor InfoArchive.

InfoArchive Information Model

AIPs are discrete packages of information that may contain none-to- many structured data elementsrepresented as XML and/or none-to- many unstructured data elements.

The ability to layer metadata over an AIP adds additional power, especially with reuse scenarios.The metadata and data elements may be extracted directly from the source application or derived

15

Page 16: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

from other systems or programmatically constructed. The ability to maintain separate data elementsin an AIP allows InfoArchive to balance the requirements to maintain exacting standards aroundchain-of-custody with the desire to build richer data sets than existed in the original applications.For example, the raw transaction history data may be extracted, modeled as XML, and verified as100% accurate and complete for chain-of-custody purposes, while additional information fromextended systems can be made part of the AIU in another data element making the AIU more usablewithout compromise.

Generating SIPs

To generate a SIP, information is extracted from the source application and written to the standardSIP format for InfoArchive as an XML file. A SIP can contain one or more information records, calledarchive information units (AIUs). The AIUs that are extracted to the SIP are defined by rules that arepart of the SIP-generation program.

When a SIP is ingested to the repository, it becomes an AIP. The AIP is stored in a logicalarchive-holding folder. The folder has a number of configuration options for setting managementparameters that are applied to AIPs — including data retention, access control, and search SLAs.AIPs extracted from different systems can be stored in the same archive holding.

Use CasesThis section illustrates how InfoArchive can help reduce costs:

• Cost Take-Out: Drive cost of your existing IT environment

• Optiomize: Make production application and infrastructure more efficient

• Information Transformation: Remove information silos and put your information to work innew ways

Cost Take-Out: Application Decommissioning

When you use InfoArchive to decommission an application, you:

• Reduce IT costs and complexity of running legacy- or duplicate systems.

• Reduce the risk of failure of existing systems.

• Retain information for future business mining or usage.

• Retain information to ensure compliance requirements are fulfilled.

• Support e-discovery initiatives.

16

Page 17: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

With application decommissioning:

• The Developer can control all aspects of an xQuery, nothing is pre-defined.

• The search form designer must understand XQuery and how it is related to search design. Thedesigner of a search needs to ensure the query is valid.

• Form widgets, result column and data binding require careful consideration. The Developer hasto handle the values and results in the xQuery.

• Developers has to test the query in a tool like xDB Admin, Eclipse or any other similar setup.

When you are using table archiving, upfront analysis is not required, as you simply have to ingest thesource application’s data. Typically, then the Developer has to create search screens similar to thesource application’s search screens.

Table archiving also requires an understanding of SQL queries that are required to create specificreports.

Content can be stored as blob (in xDB) or in the storage system, which is the recommendedalternative in production.

Retention of the source application’s data can be applied at the application, table or record level.

Table archiving handles one table, including all of the table’s row, as one or more XML files. Therefore,the schema of the XML is the schema of the table.

InfoArchive is ”schema-agnostic”, meaning that it can work with any valid XML file.

InfoArchive requires a file, metadata.xml, per database that describes the extracted XML files. Datacan be extracted into multiple files. An ANT-based ingestion script requires all files to be ingestedin one session.

Indexing needs to be enabled at the column level in the metadata.xml file. Multi-pathindexes are created at the end of the ingestion process. The task can be configured in thetools\build-table.xml file.

Path Value Indexing Multi-Path Indexing

It can be used for indexing multiple elements,but it requires every single element to beexplicitly listed in the index definition.

Multi-path indexes allow you to specifysub-paths with wildcards that will match morethan one element path, so not every elementhas to be explicitly listed. Making multi-pathindexes much more flexible and easy to use.

Smaller size means that it is faster to ingest Large index size

Better performance if you know the query aheadof time along with the number of predicates

Only option for table archiving

B-tree index Lucene Inverted index

17

Page 18: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

Optimize: Active Application Archiving

Active archiving allows an organization to:

• Manage data growth and reduce costs

• Ensure that compliance requirements are fulfilled

• Ensure long term accessibility of information

• Support e-discovery initiatives

• Optimize production systems

• Improve application performance

• Reduce upgrade downtime

• Improve overall manageability

Active archiving allows you to archive and retain large volumes of transaction records, fixed content(i.e., checks, statements) and inactive application data. In regulated industries, information fromcompleted projects such as pharmaceutical studies, Cases and Construction Projects can be archivedtogether, which ensures compliance, long term accessibility and optimization of production systems.

Information Transformation and Reuse

InfoArchive allows an organization to:

• Extend the value of information

• Remove application information silos

• Provide new context for users

User RolesFor more information about administering user roles, see Using the Groups Tab and Using thePermissions Tab.

Role Responsibilities

Business Owner Drives business decisions to archive legacy applications and applicationdata into lnfoArchive.

Monitors high-level application archival statuses.

Defines corporate information preservation requirements.

IT Owner Reviews and responds to archive requests.

Plans for data archiving and application archiving.

Reviews both high-level and detailed application archival statuses.

18

Page 19: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

Role Responsibilities

IT Administrator Sets up the InfoArchive system.

Performs day-to-day administrative tasks on InfoArchive.

Implements information security policies.

Retention Manager Collects corporate, regulatory, and legal requirements for informationpreservation.

Establishes and implements retention policies and prevents inadvertentdisposal of data.

Promptly responds to new requests for holds and retentions.

Keeps data custodians informed about upcoming purges or changesto retention policies.

Developer Designs the InfoArchive Web Application UI based on businessrequirements.

End User Accesses archived application data.

Inline Users

For easy demo-ability and for getting up and running for POC work, there is a need for set of userswith pre-configured roles (server running in infoarchive.ias.profile.HTTP_BASIC profile):

User Authorization Role

[email protected] Authorization: BasicYWRhbUBpYWN1c3RvbWVyLmNvbTpw-YXNzd29yZA==

Administrator

[email protected] Authorization: BasicYm9iQGlhY3VzdG9tZXIuY29tOnBhc3N3b3Jk

Business Owner

[email protected] Authorization: BasicY29ubmllQGlhY3VzdG9tZXIuY29tOnBhc3-N3b3Jk

Developer

[email protected] Authorization: BasicZW1tYUBpYWN1c3RvbWVyLmNvbTpw-YXNzd29yZA==

End User

[email protected] Authorization: BasicaW1yYW5AaWFjdXN0b21lci5jb206cGFzc3d-vcmQ=

IT Owner

19

Page 20: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

User Authorization Role

[email protected] Authorization: Basiccml0YUBpYWN1c3RvbWVyLmNvbTpw-YXNzd29yZA==

Retention Manager

[email protected] Authorization: Basicc3VlQGlhY3VzdG9tZXIuY29tOnBhc3N3b3Jk

AdministratorBusiness OwnerDeveloperEnd UserIT OwnerRetention Manager

The password for each inline user is ’password’.

Troubleshooting

If you experience difficulty logging in or encounter weird behavior, refresh the browser cache.

Branding CustomizationInfoarchive supports limited, drop-in branding customization. Customers are able to define the viewand display of the font, color, images and styling of the InfoArchive web application.

Configuration to Add Sample Branding Customization

A sample branding customization is provided to the customer in the following directory:<install_directory>\tools\tenants\infoarchive\customization.

To review the sample branding, copy the customization folder and paste it into the followingdirectory: <install_directory>\config\webapp.

20

Page 21: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

Configuration to Add New Branding Customization

To replace the image file of the InfoArchive web application, complete the following steps:

1. Open the following directory in Windows Explorer: <install_directory>\config\webapp\customization\branding\images.

2. Review the image requirements and directions in the README file.

3. Copy and replace the image file to the folder. Do not change the name of the image file.To replace the styling and CSS rule of the InfoArchive web application, complete the following steps:

1. Open the following directory in Windows Explorer: <install_directory>\config\webapp\customization\branding\css.

2. Review the information about current CSS rules in the README file.

3. Open the custom.css file with a text editor and edit the styling rules using CSS syntax.

Setting the Customization Location when Deploying toTomcat or Other Containers

The previously described steps work when the InfoArchive web application is run as a standaloneSpringboot application using the infoarchive\bin\infoarchive-webapp command. When it isdeployed to an external Tomcat container, however, the situation is different.

For example, assuming the external Tomcat container is installed at:

C:\apache-tomcat-8.0.32

And the InfoArchive web application .war file is deployed at:

C:\apache-tomcat-8.0.32\webapps\infoarchive-webapp.war

When deployed, the Tomcat expands the .war file to the following folder:

C:\apache-tomcat-8.0.32\webapps\infoarchive-webapp

Now, assume that you copied the customization folder to the following directory:

C:\apache-tomcat-8.0.32\webapps\infoarchive-webapp\config\webapp\customization

Then, one way to set the customization location is to edit the application.yml file located at:

C: \apache-tomcat-8.0.32\webapps\infoarchive-webapp\WEB-INF\classes\application.yml

and change the key:

:infoarchive:gateway::

21

Page 22: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Overview

customization:location: ”file:///C:/apache-tomcat-8.0.32/webapps/infoarchive-webapp/config/webapp/customization”

:

Alternately, another method to set the customization location is by creating a web.xml file at:

C:\apache-tomcat-8.0.32\webapps\infoarchive-webapp\WEB-INF\web.xml

Set the contents to:

<web-app version="3.0"xmlns="http://java.sun.com/xml/ns/javaee"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://java.sun.com/xml/ns/javaeehttp://java.sun.com/xml/ns/javaee/web-app_3_0.xsd">

<context-param><description>INfoarchive Customization location</description><param-name>infoarchive.webapp.customization.location</param-name><param-value>file:///C:/apache-tomcat-8.0.32/webapps/infoarchive-webapp/config/webapp/customization/</param-value>

</context-param></web-app>

Verifying and Viewing the Branding Customization

To verify if the web customization, complete the following steps:

1. Open the InfoArchive web application in an Internet browser.

2. Refresh and reload the web page.

3. The customized branding style should be displayed.

Troubleshooting

If the new customization is not displayed, clean up the browser cache.

22

Page 23: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Chapter 2Connectors

InfoArchive’s simple XML-based ingestion template and integration framework offers a simple,open integration interface.

Table Data Record File Compound

Any ETL tool* Any ETL tool* Any ETL tool* Any ETL tool*

Oracle, BD2, SQL InfoArchiveDocumentumConnector

EMC InfoArchiveConnectors

Asset Suite forInfoArchive

Oracle, SQL

InfoArchiveSharePoint Connector

EMC PartnerConnectors

EMC Kazeon

File shares

Crawford

Print Stream, reports

FME Migration Centre

Filenet, Notes,SharePoint, Opentext,Alfresco, File shares

* Includes Talend, Powercenter, Datastage (Infosphere Information Server), Pentaho, AB Initio,Clover, Data Integrator and BO Data Integrator

23

Page 24: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Connectors

Application and Platform ExamplesInfoArchive preserves application data. Data is typically extracted via the application API so theunderlying database and hardware platform is not a limiting factor.

Application Platforms Databases

Lotus Notes Mainframe Oracle

SharePoint AS400 BD2

Documentum Windows SQL

PeopleSoft Unix XML databases

Baan Solaris ADABAS

BASE T24 VMS

ASG Mobius LINUX

ERPs

Financial Applications

HR Systems (multiple)

Core Banking Applications

Customer StatementApplications

Healthcare Applications

Life Sciences Applications

24

Page 25: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Chapter 3Architecture

Table archiving focuses on the power of xDB as an XML database. SIP archiving focuses on using thelibrary functionality of xDB and the power of XML as standard and perennial format.

InfoArchive uses a Gateway pattern as a single point of access and authentication, which offers:

• An architecture that is highly scaleable. InfoArchive servers can be scaled vertically but can alsobe “specialized” on the different functionality that InfoArchive REST services offer (i.e., Ingestion,Search, JDBC, Administration, etc.). This enables scalability on any level and can grow with theSLA and ROI needs of the business.

• Modern pattern for emerging microservice-based architecture.

• A cloud-friendly environment.

OAuth2 based stateless authentication uses a modern JSON Web Token JWT. This allows forself-contained, signed, secure tokens for clustering support.

InfoArchive’s role sensitive user interface relies on an AngularJS and Bootstrap framework, alongwith Spring Boot tier.

The InfoArchive server uses Spring Data, which relies on REST services authentication (OAuth2/JWT).

Distribution is performed with HTTP load balancing.

xDB allows the customer to:

• Ingest data

• Manage federations, databases and detachable libraries

• Manage transactions

• Perform searches

• Perform transformations and serialization

The InfoArchive database can also be scaled vertically and separated into:

• The system repository that maintains the consistency of the application but not the business data.

• One or multiple metadata repositories that contain the libraries/segments used to searchstructured content.

25

Page 26: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Architecture

A holding must reside in an xDB metadata instance.

The storage systems include metadata, content and system data. A cross-holding search is possiblein one xDB instance.

This platform allows for bulk ingestion, querying, as well as configuration and administrationservices. It also sets up the framework for the main areas of InfoArchive’s functionality:

• Login and Logout

• Dashboard

• Reports

• Activity and Notification

• Preferences

• Search Design:

— Query

— Form

— Results

• Search Execution:

— Forms

— Results (ability to export)

• Configuration and Administration

• Retention Management:

— Retention Policies

— Legal Holds

IngestionThe ingestion process is one of the areas in which the differences between SIP and table archivingis evident. The common base is one or more modules that enable InfoArchive to connect to thefollowing external systems:

• Local/shared file system

• The ETL process

These external systems assist in bulk loading data.

After that, the code lines diverge up to the point where data is stored and indexed in XML/xDBdocuments, in libraries, databases or federations.

26

Page 27: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Architecture

xDB is used to manage federations, databases and libraries, which is an important part ofInfoArchive’s architecture because it plays a vital role in:

• Improving scalability

• Organizing the retention process

• Lowering a customer’s total cost of ownership by moving data to cheaper storage

• Mapping xDB data to data in HDFS

• Optimizing queries (knowing what data partition is in which library)

What is ETL?

ETL stands for Extract, Transform and Load, the three combined functions that are needed to takedata from an active legacy application to another platform.

Extract Read data from an application system. Data is extracted in its “natural”form, whether it is in tables or file format.

Transform Applies rules or functions to covert source data extract to XML. Somesystems can natively export in XML.

Load Ingests the transformed XML data into xDB. Digital data storage(DDS) provides a toolset for loading data with assigned retention andmetadata.

An ETL process is needed by all in order to move legacy data into InfoArchive so that the sourceapplication can be turned off or archived. The EMC solution is open to any ETL tool that cantransform data to XML or submission information packages (SIPs). Some applications have a builtin facility to export to XML

How to Extract Data

Applications and data cannot be archived without some means of extracting and loading data intoInfoArchive.

• Data validation is part of the ETL process.

• Audit points and chain of custody metadata stored with the legacy data are critical for governanceand compliance.

• Only after validation and audit requirements have been met can a legacy system be shutdown andremoved from existence, or data can be purged from the original application.

27

Page 28: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Architecture

StorageWhat happens to data upon ingestion depends on whether the data is structured or unstructured:whereas unstructured data goes into a file system, an xDB database. Before ingestion, there is somenecessary and some optional configuration for storage of structured and unstructured data.

• Structured data, such as AIUs or records, goes into an xDB database. A minimum of oneXdbFederation and XdbDatabase need to be configured. The XdbDatabase must not. however, bethe same database configured for Spring Data.

• Unstructured data goes into a file system and then the InfoArchive database. A minimum of oneXdbFederation, XdbDatabase and FileSystemRoot need to be configured.

Because ingestion is always in the context of an application, an application has to be configured.

XdbLibraries and FileSystemFolders are in the XdbDatabases, under the FilesystemRoots. When datais ingested into an application, it is ingested into the application’s space.Which XdbLibraries andFileSystemFolders are picked for ingestion is further determined by an application’s space.

The following list illustrates the hierarchy for storage configuration, with the most importantattributes per item:

• XdbFederation (name, bootstrapUrl, superUserPassword)

— XdbDatabase (name, adminPassword)

• FileSystemRoot (name, path)

• Tenant (name)

— Application (name, ...)

— Space (name)

— SpaceRootXdbLibrary* (name, XdbDatabase)

— XdbLibrary (subPath, ...)— ...

— XML documents with records or AIUs

— Content/Blobs

— SpaceRootFolder* (path, FileSystemRoot)

— FileSystemFolder (subPath,...)

— ...

— Content/Blobs

A space is a collection of libraries in XdbDatabases (called SpaceRootXdbLibraires) and folders inFileSystemRoots (called SpaceRootFolders). Typically, customers will not require more than one ofeach.

If table data is ingested, a space is constructed and configured automatically.

The following list illustrates the configuration flow:1. The Administrator configures an XdbDatabase.

2. The Administrator configures a FileSystemRoot.

28

Page 29: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Architecture

3. The Developer configures an application.

4. The Developer configures a space for the application with a SpaceRootXdbLibrary in theXdbDatabase and a SpaceRootFolder under the FileSystemRoot.

5. Ingestion starts.

The FileSystemFolders and XbBlibraries (essentially, anything under a SpaceRootFolder orSpaceRootXdbLibrary) is generated by the server as part of the ingestion process. The administratordoes not have to configure these entities.

In the Admin UI, a space can be configured through forms for configuring SpaceRootXdbLibrariesand SpaceRootFolders, In those forms, the customer can select the XdbDatabase/FileSystemRoot ofthe SpaceRootXdbLibrary/SpaceRootFolder from a list.

How Data is SearchedUsing the InfoArchive interface, it is possible to create a search from ’scratch’ or to import a search.Search composition involves: , a query editor

• The creation of a search, search form, result list and result detail

• Using the Query Editor to configure a search

• Setting the permissions to allow specific user groups to access the search

Search composition is the entry point of the search. From search composition, it is possible to create asearch, run a search and retrieve elements for the search form and the search result page.

Two types of searches can be executed:

• Synchronous search

• Asynchronous search (also named the ’background search’)

A search can be directly executed as a background search or a synchronous search. If a synchronoussearch takes too much time, it can be switched to a background search.

A search is associated with a single dataset based on the archivint type. For SIP archiving, a searchis associated with an AIC and a query configuration. For table archiving, a search is associatedwith a schema or table.

Search composition involves various components.

Every search form is contained within an application, and has a name, description and state toindicate whether the search form is a draft or ready to be accessed by the End User. Each search formis also associated with a single dataset, an object that describes the target of the search (e.g., thedatabase and tables associated with a table-archiving application). SIP-based datasets also containreference to QueryConfiguration.

Table-based search forms require a query (built upon XQueryTemplate) as well as the schema ortable the search will be executed upon.

SIP-based search forms require the name of the AIC and QueryConfiguration, which determinesthe criteria and results for the search.

29

Page 30: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Architecture

The form designer defines the dataset for a given search. When creating a table-based search form,the Developer has to select either a schema or table. For SIP-based searches, the ConfigurationDeveloper has to select an Archive Collection (an AIC) and a query configuration.

A search form can be associated with a user group or role to limit its availability to specific users. Itcan contain any number of search composition structures that configure the queries, forms, etc. Thedetails of the objects comprising a search composition can be specific to the application type. ForSIP archiving, most of these components can be synthesized from special configuration objects. Fortable archiving, the Developer has to manually complete the work.

The SIP search uses two queries.

• The first one returns the suitable AIP, based on the partition key. This information appears in thewizard when selecting the field for the search criteria.

• The second query filters the suitable AIUs in the selected AIP list. Search performance can beimproved using (and filling) at least one partition key in the search form.

Forms

For table-based searches, the search form designer must understand XQuery and how its related tosearch design. The designer of a search needs to ensure:• The query is valid.

• The parameters used are accurate.

• The correct binding for elements is used in the result set.

To manipulate the data bindings, an API was created and added to the XForms engine.

For SIP- and table-based searches, XForms is hosted within an XHTML template. The header ofthe template contains the XForm instance and the bindings. The body of the XHTML documentcontains the element directives, decorated with classes, in-line styling and other attributes, as relatedto various elements.

Search Results

During the search form composition stage, the form designer also defines the search results forthe search. For table-based searches, just as search fields are selected manually, search results arecreated by providing labels for columns that must match the associated xQuery. For a SIP-basedsearch, the list of available columns is provided from the search configuration (i.e., the selection ofthe AIC and the query configuration).

The form designer has to configure the search result list (i.e., the columns of the result page), but alsothe detail panel. The detail panel of the SIP search contains two sub-parts:

• The side panel (that appears on the side of the result grid)

• The ’Inline Panel’ to include detail data in the result grid (useful to insert repeating information(like ’authors’) in a search result.

30

Page 31: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Architecture

This is why the list of fields is longer when selecting from a schema in the inline panel compared tothe select from schema in the side panel. A repeating field can only be added in the ’inline panel’.

InfoArchive expects a "return token" service, which is a back end service that, when given an xQuery,the service returns response attributes. This is similar to the xQuery analyzer and xQuery validationservices. The service consumes the xQuery before providing the proper response.

A nested search yields a special kind of result column that includes a name and result list sortingindex. Binding attributes and result detail properties do not apply to nested searches.

For a SIP-based search, there are two methods to search :

• From the user interface with the search composition, search, search form, result master resources.

• From the REST API, with the AIC resource, it is possible to create a search with the AIC and aresult schema (see dip link on the AIC resource). This search returns raw search results in anXML format from the result schema provided.

31

Page 32: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Architecture

32

Page 33: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Chapter 4Core Configuration

Before you can start ingesting and searching data, an initial system configuration must be set up:

• System Data

• Structured Data Storage

• Unstructured Data Storage

• Initial System Configuration

Storage is configured using the following:

• System Database (xDB)

• Audit Database (xDB)

• Structured Database (xDB)

• Content Database (typically not xDB but FileSystem or EMC Isilon, ECS, Centera, etc.)

A tenant and an application must also be configured. For an application, specifically, the followingmust be configured:

• Holding

• SQL Database, schema, table layout

• Stored searches

System and Audit DatabaseThe system database is where all system objects live (tenants, applications, spaces, searches, holdings,tables, etc.).

Audit data is where audit records are stored until they are SIP-ified and archived.

The system database and audit data are both configured in the .yml file.

The following is from the application.yml for a system database:

system:userName: systempassword: systempageSize: 4096cacheSizeTotal: 50

33

Page 34: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Core Configuration

cacheSizeSystem: 50xdb:

dataNode:name: mainFederationbootstrap: xhive://localhost:8080superuser:password: test

database:name: mainDatabaseadmin:password: secret

The following is from the application.yml for audit data:

auditData:xdb:

dataNode:name: mainFederationbootstrap: xhive://localhost:8080superuser:password: test

database:name: auditDatabaseadmin:

password: secret

xDB Federations and xDB DatabasesEvery xDB federation used by InfoArchive must exist. InfoArchive can connect to a federation,but it cannot start it up.

Every xDB federation and xDB databases, with the exceptions of System and Audit, must beregistered in the system database. This can be done using a REST call, the IAShell or the InfoArchiveweb application.

Structured data in InfoArchive, including AIPs/AIUs and table data, is stored in xDB databases.

There are also two important repositories:

• A potentially large repository where PolicyApplications and HoldApplications live. It isrecommended to keep this in a federation separate from the one under "system".

managedItemData:xdb:

dataNode:name: mainFederationbootstrap: xhive://localhost:8080superuser:password: test

database:name: managedItemDatabaseadmin:password: secret

34

Page 35: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Core Configuration

• A repository for batch-related objects. This can be in the same federation, but not in the samedatabase, as the SystemData.

batchData:xdb:dataNode:name: mainFederationbootstrap: xhive://localhost:8080superuser:password: test

database:name: batchDatabaseadmin:password: secret

File System RootAny unstructured content is stored on a file system. A file system root is a folder under a mount pointof a shared file system. It must be shared by all InfoArchive servers and under the same path. Everyfile system root used by InfoArchive must be registered in the system database, either through a RESTcall, the IAShell or the InfoArchive web application,

Space Root xDB Library and Space Root FolderThe space root xDB library and space root folder are the configuration items that uniquely allocate aslice of storage to a space or application.

• The space root xDB library is an xDB library reserved for a particular space. Anything stored inthe library belongs to the same space.

• The space root folder is a file system folder reserved for a particular space. Anything stored inthe folder belongs to the same space.

Language SupportInfoArchive has the ability to dynamically add a new language translation for the InfoArchive webapplication.

The InfoArchive web application login page allows users to select a language from the drop-down listwith seven language choices. The supplemental language pack installs all primary languages. For thesupported language to work, a corresponding <locale_name>.json file should be populated into theclasspath. The classpath is automatically modified by supplemental language pack installation.

InfoArchive provides ease of configuration for new language translation.

35

Page 36: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Core Configuration

The language support is improved to be more dynamic. In the new approach:

1. The languages drop-down list values are decided at run-time based on the presence of supportinglanguage files.

2. The user can configure more languages by following the configuration steps.

Also, the language drop-down list control on the login page shows only those languages for whichthere are corresponding <locale_name>.json files either in classpath or in the configurable location. Itgives a precise picture of languages supported and avoids confusion when a user selects a languagefor which a corresponding <locale_name>.json is not present.

Adding New Language Support

Complete the following steps to facilitate support for a new language translation other than theseven primary languages.

To add support for a new language, the corresponding <locale_name>.json file should be generatedfirst. Customers should usethe en.json file from <<Installed_Dir>>/lib/infoarchive-webapp.jar under/WEB-INF/classes/static/languages folder as a reference for translation into new language.

The following example illustrates how to add support for the Dutch language.

1. Create a folder named customization under <<Installed_Dir>>/config/web app folder in thedistribution.

2. Create a folder named languages in the customization folder that you created in the previous step.

3. Create a file named languages.json in the languages folder that you created in the previousstep.

4. Edit the languages.json file and add the following content in the file:

{"nl": "Dutch"

}

5. Drop the nl.json file (equivalent of en.json, zh.json for Dutch language) into the customization> languages folder.

6. Refresh the login page (using Ctrl + R or F5). There will be an option to select the Dutch languagein the languages drop-down menu.

If you want to add support for the Russian language, complete the following steps:

1. Add an entry for the Russian language in the languages.json file:

{"nl": "Dutch","ru”: "Russian"}

2. Drop the ru.json file into the customization > languages folder.

36

Page 37: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Core Configuration

3. Refresh the login page (using Ctrl + R or F5). There will be an option to select the Russianlanguage in the languages drop-down menu.

For more information, see Setting the Customization Location when Deploying to Tomcat or OtherContainers.

Backing Up and Restoring of the Managed ItemDatabaseManaged items (i.e., managed items, policy and hold applications) are stored in a separate database.As part of a disaster recovery scenario, they need to be backed up and kept in sync with the systemdata (retention policies, holds, etc.).

Unlike AIPs, managed items can change as policies are applied, holds are added and removed, so adifferent disaster recovery strategy is required.

If configured, as managed items change, InfoArchive will automatically backup the changes to aconfigured store.

Configuring the Managed Items Database

The managed items (HoldApplication, PolicyApplication, and ManagedItem objects) arepartitioned over multiple xDBLibraries, with the AIP/table as the partition key. So, for example, allHoldApplications for AIUs in one AIP are together in a separate xDB library.

Depending on the expected size of the managed items database, you may want to move them toa different data node:

If you plan to use fine grained (AIU/record-level) retention, or expect a large number of holdson individual AIUs/records, then the managed items database may grow to even the size of thestructured data. At that point, it is no longer feasible to back up the managed items as part ofsystem data backups. The backup will simply take too long. By mapping the managed items to adifferent data node and configuring stores for managed item backups under Holdings/Databases,the backup process becomes:

• The managed items for each AIP/table are backed up separately, after each change (new orremoved hold, new retention policy application, etc.).

• This happens immediately after the transaction where the changes took place.

Configuring the Managed Item database in the application.yml file. For example:

managedItemData:xdb:dataNode:storeStackTraceInLock: falsename: mainFederationbootstrap: xhive://localhost:8080superuser:password: test

database:name: managedItemDatabase

37

Page 38: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Core Configuration

admin:password: secret

Backing Up the Managed Items Database

The managed items can be set to automatically backup when they are changed in InfoArchive. Settingthe store for the managed items is controlled by application, and each application needs to have themanaged item store configured for InfoArchive to backup managed items. By default, the manageditem store is not configured. If the store is not set, InfoArchive will not backup the managed items.

The backup strategy for the managed items takes into account how you are backing up the systemdata and how you are applying retention and holds. The system data database is where theconfiguration objects (retention policies, holds) and the retained and hold sets are located.

No matter which method is being used to backup the managed items (InfoArchive or a scheduledbackup), ensure the backups to the system data and the managed item data should be completed atthe same time.

InfoArchive uses a store for writing the managed items backup files. Ensure that there is sufficientspace on the store and that the content can be removed (during disposition of the underlyingmanaged item, hold application and policy application).

For more information, see the Using the Stores Tab section in the Configuration & AdministrationUser Guide.

Setting the Managed Item Store

For SIP archiving, you have two options when setting the managed item store:

1. Use IAShell to update the holding for the application; or

2. Use the REST interface to update the holding for the application.

The attribute on the holding object is called ’managedItemStore’ and is a store object reference.

For table archiving, you have two options when setting the managed item store:

1. Use IAShell to update the database for the application; or

2. Use the REST interface to update the database for the application.

The attribute on the holding object is called ’managedItemStore’ and is a store object reference.

XdbLibrary

For every managed item, hold application and policy application that is backed up, a correspondingXdbLibrary object is created in the system data database to represent the backup of the managed item.This object has a reference to the content object that was backed up in the managed item store. Sinceall managed items, hold applications and policy applications for a particular AIP/table are partitionedinto a single xDB library, they are represented by an XdbLibrary object. For every partition, there can

38

Page 39: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Core Configuration

be three xDbLibrary objects representing the three types of objects in the managed item database(managed item, hold application and policy application).

The XdbLibrary object has a type defined for it called MANAGED_ITEMS. All managed itemXdbLibrary objects are set to this type.

The XdbLibrary object uses the name of the object to uniquely identify it and is used to determinewhat partition (AIP/table) is associated with the managed item and what type of object it represents.

For a managed item:

ManagedItem_<partitionkey>

For a hold application:

HoldApplication_<partitionkey>

For a policy application:

PolicyApplication_<partitionkey>

Restoring the Managed Database

When restoring the managed item database, you must first decide what needs to be restored. Do onlya few of the managed items need to be restored or do all the managed items need to be restored?

The method used to perform a restoration depends on the state of the managed item database. If thedesire to restore the database back to a previous state due to a corrupted managed item database,use the xDbLibrary list to restore. If you wish to restore just a few managed items that, for whateverreason, you want to roll back, use the managed item list itself to restore.

Using the xDbLibrary

Identify which managed items you want to restore.

To restore the entire database to an empty managed item database (that you have recreated),complete the following procedure:

1. For each application, search for all xDbLibraries that have the type = MANAGED_ITEMS.

2. For each xDbLibrary, select the restore link and execute a POST.Complete the following steps to restore the entire database :

• To a managed item database that has the managed items still in the database, and

• You want to replace it with the backup

1. Under the Services link, search for all xDbLibraries that have the type = MANAGED_ITEMS.

2. For each xDbLibrary, select the detach link and execute a POST.

39

Page 40: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Core Configuration

3. For each xDbLibrary, select the restore list and execute a POST.For example:

http://localhost:8080/systemdata/applications/6b20bb40-671e-4fbe-a582-98526372283b/xdb-libraries?type=MANAGED_ITEMS

You would get the following:

"_embedded":{"xdbLibraries":[10]0:{"createdBy": "[email protected]""createdDate": "2016-09-13T11:59:50.538-04:00""lastModifiedBy": "[email protected]""lastModifiedDate": "2016-09-13T12:41:18.274-04:00""version": 3"name": "ManagedItem_48a2ea68-7f6b-4633-a44b-87d814bd3ddf""subPath": "ManagedItem/48a2ea68-7f6b-4633-a44b-87d814bd3ddf""detached": false"readOnly": false"detachable": false"concurrent": false"cacheSupport": true"cacheInCount": 0"size": 204800"indexSize": 0"type": "MANAGED_ITEMS""aipCount": 0"aiuCount": 0"closed": false"closeRequested": false"xdbMode": "PRIVATE""_links":{"self":{"href": "http://localhost:8080/systemdata/xdb-libraries/37061d55-4c2f-4fed-b74e-9b02fc6e4849"}-"http://identifiers.emc.com/request-close":{"href": "http://localhost:8765/systemdata/xdb-libraries/37061d55-4c2f-4fed-b74e-9b02fc6e4849/request-close"}-}-}

To detach, call the following:

http://localhost:8080/systemdata/xdb-libraries/37061d55-4c2f-4fed-b74e-9b02fc6e4849/detach

To restore, call the following:

40

Page 41: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Core Configuration

http://localhost:8080/systemdata/xdb-libraries/37061d55-4c2f-4fed-b74e-9b02fc6e4849/restore

Selected Managed Items

You have identified the selected managed items that need to be restored. The partition key to identifythe xDbLibrary will be based on the AIP/table ID. You have to restore all managed items for aparticular AIP or table. You cannot restore individual managed items (for example an AIU) under aparticular AIP. The entire AIP’s managed items would have to be restored.

Complete the following steps to complete the restoration:

1. Find the three xDbLibraries that correspond to the partition key. To find them, you wouldfirst have to get the AIP or table ID for the partition that you want to restore and issuethe following commands to get the three xDbLibraries (for example, the partition key is6b20bb40-671e-4fbe-a582-98526372283b:

a.

http://localhost:8080/systemdata/applications/6b20bb40-671e-4fbe-a582-98526372283b/xdb-libraries?name=ManagedItem_ 6b20bb40-671e-4fbe-a582-98526372283b

b.

http://localhost:8080/systemdata/applications/6b20bb40-671e-4fbe-a582-98526372283b/xdb-libraries?name=PolicyApplication_ 6b20bb40-671e-4fbe-a582-98526372283b

c.http://localhost:8080/systemdata/applications/6b20bb40-

671e-4fbe-a582-98526372283b/xdb-libraries?name=HoldApplication_ 6b20bb40-671e-4fbe-a582-98526372283b

2. Once you have the xDbLibrary object, issue the detach and restore commands:

a.http://localhost:8080/systemdata/xdb-libraries/37061d55-4c2f-

4fed-b74e-9b02fc6e4849/detach

b.http://localhost:8080/systemdata/xdb-libraries/37061d55-

4c2f-4fed-b74e-9b02fc6e4849/restore

Using the Managed Items

If the managed item list is available from an application, there are links off the managed item todetach and restore a managed item. The following example illustrates how to restore a managed item:

1. Get the list of managed items from an application.

41

Page 42: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Core Configuration

2. Each of the managed items that you want to restore, the links will be on the managed item object:

a.

http://localhost:8080/systemdata/managed-items/8d2edb7a-01d0-4354-bf8a-65aa0e309073/detach

b.

http://localhost:8080/systemdata/managed-items/8d2edb7a-01d0-4354-bf8a-65aa0e309073/restore?xdbLibraryId=58cf9369-5d3b-4c2d-b2ad-272cde64ef4c

For each of the policy applications that is off the managed item, there are links off the policyapplication to detach and restore a managed item. The following example illustrates how to restore amanaged item:

1. Get the list of policy applications from the managed item

a.

http://localhost:8080/systemdata/managed-items/8d2edb7a-01d0-4354-bf8a-65aa0e309073/retention-applications

2. Each of the managed items that you want to restore, the links will be on the managed item object:

a.

http://localhost:8080/systemdata/retention-applications/d9396a58-a2f9-456d-8268-c6af5d389fba/detach

b.

http://localhost:8080/systemdata/retention-applications/d9396a58-a2f9-456d-8268-c6af5d389fba/restore?xdbLibraryId=ddaea662-b79a-4055-bd0d-5e142440fbe3

For each of the hold applications that is off the managed item, there are links off the hold application todetach and restore a managed item. The following example illustrates how to restore a managed item:

1. Get the list of policy applications from the managed item:

a.

http://localhost:8080/systemdata/managed-items/8d2edb7a-01d0-4354-bf8a-65aa0e309073/hold-applications

2. Each of the managed items that you want to restore, the links will be on the managed item object:

a.

http://localhost:8080/systemdata/hold-applications/d9396a58-a2f9-456d-8268-c6af5d389fba/detach

42

Page 43: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Core Configuration

b.

http://localhost:8080/systemdata/hold-applications/d9396a58-a2f9-456d-8268-c6af5d389fba/restore?xdbLibraryId=ddaea662-b79a-4055-bd0d-5e142440fbe3

To restore the managed items, identify what items will need to be restored. Currently, the only wayto initiate the restoration is through the REST interfaces. The restoration of the system data at thesame time needs to also be done to ensure that the configuration data and the retained and hold setsstay in sync with the managed item database.

If you want to manage the backups separately from InfoArchive, do not configure the managed itemstore and backup the managed item and system data xDB databases separately.

43

Page 44: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Core Configuration

44

Page 45: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Chapter 5Creating a Search

The Developer is able to create a search template for:

• A SIP archive

• A table archive

An application can contain multiple search templates.

It is responsibility of the search designer to ensure that only the correct groups have access tocertain search templates and to ensure that sensitive information is marked correctly during searchcomposition.

InfoArchive supports open-ended date ranges in a search.

Creating a Search for a SIP Archive ApplicationThe Developer has the ability to create a search for a SIP archive application, either from scratchor by importing an existing search (Zip file).

1. Select the application in which you are creating a search. The application should be configuredenough to build the search. For example, it is not possible to create a SIP search after manuallycreating an application without creating the configuration objects (at least an AIC and a queryconfiguration objects).

2. Click Add Search and select one of the following:• Create New: Proceed to the next step.

• Import from ZIP file:

— Select the ZIP file and click Open. If required:

— Any permissions specified need to be added to the search.

— Set the state of the search to ’Ready’.

— Proceed to step 7.

Tip: Exported searches are associated with an application and the application must match.

3. Enter the following information:

45

Page 46: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Field Description

Search Name Enter a name for the search that is unique within the application.

The following special characters are not supported: ’#’, ’<’, ’>’,’\’, ’/’, ’!’, ’=’.

Description Enter a description for the search.

Type Indicate whether you are creating a:

• Primary search, which can be accessed directly by users.

• Nested search, which is a search to be linked from other searchesto retrieve additional information on a result.

Archival Collection Select the AIC.

Configuration Select the query configuration that will be searched. The queryconfiguration contains information to build the search from (i.e.,the list of queryable fields, the partition key, the indexed fields,the field format, the query quota, etc.) and the result page (thelist of result fields).

4. Click Next.

5. The Criteria page contains information returned by the query configuration object, whichincludes:• Any fields returned by the query configuration object.

• The Partition Key column indicates the field designated as the partition key. While notobligatory, a search form that uses a partition key works more efficiently.

• The Index column indicates fields that are indexed.

The Criteria page may be blank if information was not set up in the query configuration object.On the Criteria page:

a. In the Show in Form column, indicate which fields should be included in the search form.

b. In the Required? column, indicate which fields the user will have to complete prior torunning the search.

Tip: It is not mandatory to make any of the criteria required, but not specifying any criteriawill result in all of the AIUs being returned. For more information, see Query Quota.

c. Click Next.

6. The Results page contains the column names returned by the query configuration object. TheResults page may be blank if information was not set up in the query configuration object.On the Results page:

a. In the Include in Results column, indicate the fields that will be shown as column in thesearch results.

b. For the selected fields, indicate whether the column should include a default sort.

c. Click Finish.

46

Page 47: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

7. You are now able to further refine the search . Click Save at any time to save your changes.In the Search Form tab, you have the options of:• Manually adding fields to the search form. Click Add Form Field. A pallet is displayedthat contains the various elements that can be added to the search form. Rearrange the searchform by dragging and dropping an element. To remove an element from the search form, click’X’ in the top right corner of the element being removed. To further configure an element,click . For more information, see Configuring Search Form Fields.

• Selecting fields from a schema to add to the form. Click Select from Schema and repeatstep 5.

• Once you start adding fields, click Preview to see how the search form will appear.

8. In the Result List tab, you have the options of:• Manually adding columns to the search results. Click Add Column. To further configure acolumn, click . For more information, see Configuring a Result Column.

• Selecting columns from a schema to add to the form. Click Select from Schema andrepeat step 6. To remove a column, click ’X’.

9. The Result Detail tab allows you to enter information that will appear in a panel when a rowin the search results is selected. In the Result Detail tab, you are able to add tabs. If the resultscontain a lot of fields, consider organizing them in tabs. To add a tab:

a. Select whether you want the tab to appear in a Side Panel or an Inline Panel. Repeatingfields can only appear in the inline panel.

b. Click Add Tab.

c. Click to:• Enter a Detail Field Label,

• Enter a Detail Field Name, or

• To remove a tab, click ’X’.

10. In the Permissions tab, indicate which group names will be able to access the search form.

Updating a Search Template Status to ReadyWhen a search template is created, the status remains in Draft mode until it is updated by theDeveloper. When a search template is in Draft mode, the End User will not be able to access it untilthe template’s status is set to Ready.

1. Select the application in which the search is stored.

2. Click and select Set to Ready.

Whenever you edit a search template, the Status of the template returns to Draft. You are, however,able to update the Status of a search template being edited.

1. The current Status of the search template is displayed beside the name of the template. If anychanges have been made, the Status is listed as Draft. Once you have finished editing the

47

Page 48: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

template, click . Changing any data, including the query or columns, resets the status toDraft, which allows the Developer to test the changes prior to setting the status to Ready.

2. On the Edit Search page, update the Status field to Ready.

3. Click OK.

4. Click Save.

Creating a Search for a Table ArchiveApplicationThe Developer has the ability to create a search for a table archive application.

1. Select the application in which you are creating a search.

2. Click Add Search and select one of the following:• Create New: Proceed to the next step.

• Import from ZIP file:

— Select the ZIP file and click Open. If required:

— Any permissions specified need to be added to the search.

— Set the state of the search to ’Ready’.

— Proceed to step 5.

Tip: Exported searches are associated with an application and the application must match.

3. Enter the following information:

Field Description

Search Name Enter a name for the search that is unique within the application.

The following special characters are not supported: ’#’, ’<’, ’>’,’\’, ’/’, ’!’, ’=’.

Description Enter a description for the search.

Type Indicate whether you are creating a primary or nested search.

Archival Collection In this section:a. Select a Database the search form will access when executed.

b. Select a Schema the search form will access when executed.

c. If desired, select a Table the search form will access whenexecuted.

Tip: Specify the table if the search will be used to apply a holdto the results.

48

Page 49: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

4. Click Create.

5. You are now able to further refine the search. Click Save at any time to save your changes.

6. In the Search Form tab, click Add Form Field. A pallet is displayed that contains the variouselements that can be added to the search form. Rearrange the search form by dragging anddropping an element. To remove an element from the search form, click ’X’ in the top right cornerof the element being removed. To further configure an element, click . For more information,see Configuring Search Form Fields. Once you start adding fields, click Preview to see howthe search form will appear.

7. In the Result List tab, click Add Column. To edit a column, click . For more information, seeConfiguring a Result Column. To remove a column, click ’X”.

8. The Result Detail tab allows you to enter information that will appear in a panel when a rowin the search results is selected. In the Result Detail tab, you are able to add tabs. If the resultscontain a lot of fields, consider organizing them in tabs. To add a tab:

a. Select whether you want the tab to appear in a Side Panel or an Inline Panel. Repeatingfields can only appear in the inline panel.

b. Click Add Tab.

c. Click to:• Enter a Detail Field Label,

• Enter a Detail Field Name, or

• To remove a tab, click ’X’.

9. Use the Query Editor to compose the search components, including the search form, results andresult details. For more information, see XForms, XQueries and Query Results.

10. In the Permissions tab, indicate which group names will be able to access the search form.For more information, Exporting Search Results.

Editing a SearchThe Developer is able to edit a search. Any changes to a search in the ’Ready’ state causes the searchform to revert to ’Draft’ mode.

1. Select the application in which the search being edited is stored.

2. Click and select Edit.

3. Edit the search, as desired.

4. Click Save.

Deleting a SearchThe Developer is able to delete a search.

49

Page 50: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Searches in use cannot be deleted. Searches are considered in use when a user has:

• Executed a background search.

• Applied a hold to the results of a background search

To delete a search that is in use:

• Ensure that the hold set is removed, if you applied a hold to the search results, or

• On the Background Results tab, click on ’x’ to delete the background search.

1. Select the application in which the search being deleted is stored.

2. Click and select Delete Search.

3. When prompted to verify that you want to delete the search, click Delete.

Configuring Search Form FieldsThe Developer is able to further refine a search form field, as well as the error messages that areissued when the user enters incorrect information

1. In the Search Form tab, click to configure a field and the error messages that are issued when theuser enters incorrect information.

In the Result List tab, click to configure a result column.

2. In the Properties tab, enter the following information, depending on the element being configured:

Field Description

UI Control A read-only field that is set to ’Input’ for the search formcriteria.

Data Binding For table searches, enter the name of variable in the xDBquery.

For SIP searches, define the value as a criterion.

The name value has to match in order to use the criterionfor the SIP search.

For more information, see Example of Data Binding.

Field Label Enter the name for the field shown on the form.

Required Indicate whether an input value is required.

Tooltip Text Enter concise, helpful information about the field thatappears in a small “hover box” when the user hovers thecursor over the control.

Regex Pattern Enter the regular expression used for validation on the field.

The Messages tab allows you to enter an error message in theevent that a user entry does not match the regex expression.

50

Page 51: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Field Description

Minimum Characters/Maximum Characters

Enter the minimum and maximum number of characters theuser can enter on the search form.

Search Check the box to allow a wildcard search using asterisk inthis field.

Tip: If a wildcard search is allowed, and a regex pattern isspecified, ensure that the regular expression allows for ’*’ inthe field.

Example of Data Binding

The following example illustrates how it is defined in the query:declare variable $lastName external := ""

The following example illustrates the AIC definition under resources for the PhoneCallsapplication:<criterias>

<name>CallStartDate</name><label>Call Start Date</label><type>DATETIME</type><pKeyMinAttr>pkeys.dateTime01</pKeyMinAttr><pKeyMaxAttr>pkeys.dateTime02</pKeyMaxAttr>

</criterias><criterias>

<name>CallEndDate</name><label>Call End Date</label><type>DATETIME</type>

</criterias><criterias>

<name>CustomerID</name><label>Customer ID</label><type>STRING</type><pKeyValuesAttr>pkeys.values01</pKeyValuesAttr>

</criterias><criterias>

<name>CustomerLastName</name><label>Customer Last Name</label><type>HASHED</type>

</criterias><criterias>

<name>CustomerFirstName</name><label>Customer First Name</label><type>STRING</type>

</criterias><criterias>

<name>RepresentativeID</name><label>Representative ID</label><type>HASHED</type>

</criterias><criterias>

<name>CallFromPhoneNumber</name><label>From PhoneNumber</label><type>STRING</type>

</criterias><criterias>

<name>CallToPhoneNumber</name><label>To PhoneNumber</label>

51

Page 52: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

<type>STRING</type></criterias>

3. In the Messages tab, enter the messages you want to issue. The messages will depend on theelement being configured.

4. Click OK.

Configuring a Result ColumnThe Developer is able to further refine a result column from the Result List tab.

1. Click to configure a column.

2. Enter the following information:

Field Description

Column Label Enter the label of the column, which will appear as the header in theresult details.

Column Type If Schema Column Name is selected, complete the rest of the fields.

If Linked Column (Nested Search) is selected:a. Select a Column Name.

b. In the Nested Search Mapping section, click Add.

c. Select a Result Column Binding Name and a Search Field BindingName.

d. Click OK.

If Linked Column (External URL) is selected:a. Enter the URL.

b. In the External URL Parameters Mapping section, click Add.

c. Enter an External URL Parameter Name.

d. Select an External URL Parameter Name.

e. Click OK.

If Downloadable Content is selected:a. Select a Column Name.

Include in Export If selected. allows the user of the search to export the search results ofthe selected column.

52

Page 53: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Field Description

Column Name For a table search, the column name is the column name defined in therow. For a SIP search, the column name is the name of the field. Thisis the name (case sensitive) that is specified in the criterias section inXML and is also defined in the query XML file:

<operands> <name>CallStartDate</name> ...

Sort Indicate whether:• Sort will be disabled for the column.

• Sort will be enabled for the column.

• The column is to be displayed as a default sort.

Data Type Select the data type for the column.

Hide Column Indicate if you want the result column to be hidden from the user.

Creating a Duplicate SearchThe Developer is able to create a duplicate of an existing search and refine it to create an entirelynew search.

1. Click and select Create Duplicate.

2. Enter the following information:

a. A unique Search Name for the new search form.

The following special characters are not supported: ’#’, ’<’, ’>’, ’\’, ’/’, ’!’, ’=’.

b. A brief Description of the new search form.

c. Click OK.

The new search form appears in the list of search forms on the Record Search tab.

3. To further refine the new search, click . For more information, see Editing a Search Form.

Table-Based Search – XForms, XQueries andQuery ResultsThe following section provides XQuery examples and how they relate to XForms (i.e., how an XFormcan be derived from an XQuery, or how form output can become query input). The section alsoexamines query result configuration for an XQuery.

53

Page 54: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Example: First Name and Last Name with ExternalVariables

The following XQuery represents a search on a person’s first and last names, returning an address,each in a <row> element. Specify the schema, the table and the case matters:declare namespace table="urn:x-emc:ia:schema:table";declare variable $firstName as xs:string external;declare variable $lastName as xs:string external;

for $elem in /BASEBALL/BASEBALL_MASTER/ROW

return<row id="{string($elem/@table:id)}"><column name="lastName">{ $lname }</column>

</row>

Incomplete Forms

Even if an end user executes an incomplete search form, search results will be returned.

For instance, if the end user is searching for a person’s name, but only enters the last name, thecorrect XQuery is executed:

Incorrect XQuery Correct XQuery

for $row in //row where $row

/firstName = "" and $row/lastName =

"Smith"

for $row in //row where $row/lastName

= "Smith"

XQuery extension functions are used to construct and execute a string to return search results. Theextension function to execute an XQuery string (xhive:evaluate) comes with xDB. Other extensionfunctions are incorporated into InfoArchive’s search composition functionality, where ia:conditionconstructs a condition clause for a form field if the field has a value, and an empty string.ia:where concats non-empty condition clauses by, for example. "and"

The benefit of this construct is that it gives the developer complete freedom in XQuery construction.The InfoArchive server only has to provide the form output to XQuery to provide the "binding".

Nested Searches

While the Developer can easily create a nested search for a SIP archive, creating a nested search for atable archiving requires configuration.

54

Page 55: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

The developer must:• Add a column with a header to indicate the associated nested search, and

• Add an icon in each cell under the header to trigger a nested search with the appropriateparameter settings.

For instance, suppose that for each returned city, a user can search for any tourist destinations thatcity, the resulting table would appear like the following:

Street City Tourist Destinations

West 44th Street New York

Eighth Avenue New York

Cicking the lightbulb triggers an additional search for tourist destinations. In order to make thiswork, the developer needs to know:• The number of columns and the column headers.

• Which columns represent nested searches.

• For cells that contain a lightbulb, which REST request should be sent to the server for the nestedsearch, which includes any search parameter values (in this case ’New York’).

To assist the developer, InfoArchive can either:• Put the information in the XQuery result, or

• Put the information in the search configuration.

If the information is in XQuery, it would be rendered like the following:(<columns><column name="street" label="Street"/><column name="number" label="Number"/><column name="city"label="City"/><column label="City Stuff" linkRel="city-stuff" parameters="city"/></columns>,for $row in //row where@begin-and@input: firstName$row/firstName = $firstName

@input: lastName$row/lastName = $lastName@end-and

return <row><column>{$row/street}</column><column>{$row/number}</column><column>{$row/city}</column><column>{$row/city}</column></row>)

The above query first returns a <columns> element that contains more <column> elements, followedby a number of <row> elements. InfoArchive lets the developer know that, in the Tourist Destinationscolumn, the "linkRel" attribute is a linkRel to follow on the search configuration response, whichcontains a URL to a nested search (e.g. .../nested-searches/{search-id}).

The input of the nested search is bound to the query parameters of the REST call.

You do not require dedicated column to invoke the nested search. You can bind the nested searchto an existing column where the value in the column will be hyperlinked, which avoids sacrificinga column for the nested search. Also, any column can invoke nested search (i.e., values shown inthe Side and Inline panels can invoke a nested search as well).

Also, any column can invoke nested search (i.e., values shown in the Side and Inline panels caninvoke a nested search as well).

55

Page 56: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Authorization

The Developer can limit the information that an end user sees by:• Hiding columns from the result set.

• Masking a field.

• Hiding rows:

— Based on a single row (i.e., the developer decides that a particular row in a table is onlyaccessible by users in certain roles.

— Based on some criterion (i.e., only users in certain roles are allowed to see rows matchinga certain predicate).

Multiple XQueries/XForms per Search

The developer must associate multiple XQueries and XForms for a search.

The following XQuery allows the specific role to see only non-VIPs, and only their cities, not addressdetails:declare variable $firstName as xs:string external;declare variable $lastName as xs:string external;for $row in //row where$row/firstName = $firstName and $row/lastName = $lastName and $row/vip = false

return <row><column>{$row/city}</column></row>

Using ANT Tasks to View, Create, Delete orUpdate XQuery ModulesThere is no way to view, create, delete or update XQuery modules in the InfoArchive web application.For this reason, a number of ANT tasks were introduced.

The followingANT tasks can be run from the tools/applications/<application > directory</application>.

ANT Target Description Option

view-xquery-modules Views all XQuery modules ontenant and application levels.

create-xquery-modules

Creates all XQuerymodules, locatedunder the <application>/xquery-modules directory.

create-xquery-module Creates an XQuery module from afile.

option = location – Required. Thelocation of the file.

update-xquery-module

Updates an XQuery module froma file.

option = location – Required. Thelocation of the file.

56

Page 57: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

ANT Target Description Option

delete-xquery-module Deletes an XQuery module. option = name – Required. Thename of the XQuery module

option = context – Optional.Context can be tenant orapplication.

export-xquery-module Exports the XQuery module object,including all properties to an XMLfile.

option = name – Required. Thename of the XQuery module.

option = context – Optional.Context can be tenant orapplication.

option = location – Required. Thelocation of the output file.

As XQuery modules can be stored on both tenant and application level, an XQuery modulewith the same name can exist on both tenant and application level. For this reason, ANT targets’export-xquery-module’ and ’delete-xquery-module’ have the following strategy when no contextoption is set:

• If an XQuery module is found on the application level, the operation is applied to this module. Ifno XQuery module is found on application level, the operation is applied on the XQuery moduleon tenant level, if found.

Background SearchesInfoArchive’s background search functionality allows a user to execute an asynchronous search. Asearch must be executed asynchronously when:

• The search scope is too wide and the result quota for SIP searches will be exceeded.

The quota limits the number of results. Reaching the search quota does not necessarily force thesearch to be run as a background search. Only a time-out (30 seconds) is the only criteria requiringa background search.

Quotas are unavailable for table searches.

• After an initial search times out. If a synchronous search controller is taking too long, the user isnotified that it will take time to locate the results of the search. The user can opt to:

— Cancel the search or

— Run the search in the background. If desired, the user can change the default search name(also referred to as the order name) and must then submit the search order. It is best to use aunique name for the order to helps locate the results. While the search is running, the useris able to complete other InfoArchive tasks. Once the results of the search are available, the

57

Page 58: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Background Results tab appears in the main header. The user is able to view or export thebackground search results into a .CSV file.

A user can also opt to execute a search to run in the background when the search form is firstselected. Instead of entering the required search parameters and clicking Search, the user wouldclick Run in background.

Order

Configuration object that factorize a set of properties to apply on order item objects. For SIPasynchronous search, the order is retrieved based on the Search used to create the orderItem.

Field Description

id Type: UUID

Label: ID

application Type: Application

Label: Application

Application of this order.

name Type: String

Label: Name

Name of this order.

priority Type: Integer

Label: Priority

Order Items created from this order configuration will have thispriority.

retentionPeriod Type: Integer

Label: Retention Period

Order Items created from this order configuration will have thisRetention Period if no Retention Policy is set.

58

Page 59: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Field Description

retentionPolicy Type: String

Label: Retention policy

Order Items created from this order configuration will have thisRetention Policy.

orderItemPermission Type: Permission

Label: Permission to apply on OrderItem

Order Items created from this Order configuration will have thisPermission.

The Clean job removes order items from the system. For more information, see Clean Job.

Using the Background Results Tab

Once a background search completes its run, the results can be accessed on the Background Resultstab.

Search results are displayed in a table that contains the following information:

Column Description

Submission Date Indicates the date and time the backgroundsearch was executed.

Type Possible values include:

• Search: Displayed when the results are froma background search

• Export: Displayed when a user exportedsearch results into a CSV file

• Apply Hold: Displayed when a user applieda hold policy to an item or search results.

• Remove Hold: Displayed when a user hasremoved a hold policy from an item or searchresults.

Application Indicates the application.

Duration Indicates the time the operation took to finish. Isnot updated until the status is complete.

59

Page 60: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Column Description

Name Indicates the name of the task. For removinga hold, after then name of the holdset, thename will either have removeholditems orremovehold.

Status Status has one of the following values:

• Submitted: The request has been submitted.As the framework picks up tasks every 10seconds, this status may be rare.

• In Progress: The request is still running.

• Completed: The task completed with noerrors

• Exception: The task could not be completed.To determine what happened, refer to theia.log file.

Deleting a Background Task

1. For the background search results being deleted, click .

2. When prompted to verify that you want to delete the background search results, click Delete.

Tip: Removing background tasks can be cleaned up after a configurable number of days.

Exporting Search ResultsIt is possible to select all or only some of the search results to export. The number of items returnedduring the search is displayed in the Select All button. Once one or all of the search result items areselected, the Export button is displayed. No matter how many search result items have been selected,access the exported files in the Background Results tab.

To select all of the search result items for export:

1. Click the Select All button.

2. Click Export and select the export option.

3. Enter a Name for the exported file or use the default name provided and click OK.To select some of the search result items for export:

1. Click the box or boxes beside the items being exported. The number of items selected is calculatedbeside the Export button.

60

Page 61: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

2. Click Export and select the export option.

3. Enter a Name for the exported file or use the default name provided and click OK.

Configuring Export Functionality in theInfoArchive Web ApplicationThe configurable export feature enables the ability to configure export search results for one or morerecords. The export functionality can be selectively exposed on the main result master grid or theindividual tab on side or in-line panels at the tab level. The Export action can be selectively enabledor disabled per tab. Export configuration can be used to enable users to download search results in adifferent format with or without associated contents (attachments/blobs).

A number of common formats are available out of the box and are installed, as mentioned in theinstallation step.

During search composition in the InfoArchive web application, to add the export action for each tab,the Developer can use the ADD ACTION button at the main result list or at tab level in the resultdetail section. For more information, see Adding the Ability to Export Search Results. Using thegear button, it can be further configured for each option, by choosing to enable or disable exportconfiguration at any given level. The Developer can include more configuration objects with thesearch template. Beside common export configuration, each application can have its own specialconfiguration object from which the Developer can choose. Export search result continues to bean asynchronous activity. The option to choose between asynchronous or synchronous options isshown in the UI but is disabled.

At runtime, users are given a choice of exporting search results according to the options selectedduring composition time. When exporting at the tab level in the detail section, one row of data(selected row) is available for the export operation. From the main search result screen, the user isrequired to select an export option from the Export menu, which contains options selected by thesearch designer.

The download option also depends on the browser settings. For instance, in some browsers, a save asmenu may be presented or the download may start when the download button in the backgrounditems listing page is selected. The gzip option downloads with a .gz extension. In case of includedcontent or multiple files in the archive for other reasons, the content of the .gz file will be a tar fileand the full file extension will be .tar.gz. It is also possible to download as .tar or .zip files instead,assuming the corresponding export pipelines are enabled.

Composition (Edit) Mode

In the main grid, when user clicks on the RESULT LIST tab, the available actions toolbar will appearand a default configuration (i.e., csv, gzip format with no content) is selected. The Developer maychoose to change this setting. The Developer can disable the export feature for the main result listingpage or individual tabs. By default, it is off for tabs in the detail section.

61

Page 62: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Click next to EXPORT to launch the Export configuration selection dialog (the Developer candisable the export functionality for the main grid and/or individual tab using the top switch):

When EXPORT is disabled:

62

Page 63: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Installing the Export FunctionalityTo install tenant-level export configurations, the Administrator needs to run ANT from the\infoarchive\tools\tenants\infoarchive directory

Backward CompatibilityAll searches created using 4.0 and all sample search will run. The EXPORT option is available at runtime from the main result grid (some or all rows selected).

Adding the Ability to Export Search ResultsThe Developer is able to allow search results to be exported in a specific format. This action can beperformed when a search is being created or edited.

1. Click the button and select Export. The Add Action is available in the ResultListing tab for the main result grid, as well as the Side Panel and Inline Panel of the ResultDetail tab.

2. Click the button to further configure the export process.

63

Page 64: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

3. Select the pipeline to be permitted for the export process. A pipeline represents a series ofprocessing steps for the selected search result. The following options are available:

Pipeline Description

gzip_csv gzip envelope for csv export

gzip_csv_with_content gzip envelope for csv export with content

zip_csv_with_content zip envelope for csv export with content

tar_csv_with_content tar envelope for csv export with content

4. Click OK.

5. Click Save.

Configuring Export Objects

Configuring Application-Specific Export Objects usingANT Scripts

Use ANT scripts located in the Tools directory to configure such objects as ExportPipeline (EP),ExportConfiguration (EC) and ExportTransformation (ET).

There could be a folder called exports for an application (i.e., tools/applications/PhoneCalls/exports).Store the file build-exports.xml file for other applications, as it will be used to automaticallyconfigure any application-specific objects.

Configuring EP, EC and ET Objects using IAShell

It is important to remember to use a pre-configuration file when starting IAShell. For instance,if the following commands are stored in the pre-configuration file, the commands are executedautomatically:

connect http://localhost:8080/services --gateway http://localhost:8080--user [email protected] --psw passworduse --tenant INFOARCHIVE --federation mainFederationuse --application PhoneCalls

You can also execute the commands in IAShell directly.

The following example illustrates how to configure certain objects in XML format:

<?xml version="1.0"?><configuration>

<object typeAlias="app-export-pipeline" checkExistSpEL="?[name=='my-search-results-csv-gzip']

" var-set="csvGzipPipeline">

64

Page 65: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

<create><name>my-search-results-csv-gzip</name>

</create><update>

<description>gzip envelope for csv export</description><inputFormat>ROW_COLUMN</inputFormat><outputFormat>csv</outputFormat><envelopeFormat>gz</envelopeFormat><includesContent>false</includesContent><content>Place your pipeline content here</content>

</update></object><object typeAlias="app-export-configuration" checkExistSpEL="?[name=='my-gzip-csv']" var-set="csvGzipConfig">

<create><name>my-gzip-csv</name>

</create><update>

<pipeline>#{csvGzipPipeline}</pipeline><description>gzip envelope for csv export</description><exportType>ASYNCHRONOUS</exportType>

</update></object>

</configuration>

You can refer to the objects created by SpEL if you use the var-set attribute.

After that, use the following file to configure your objects:

configure --from "<path_to_your_file>"

The following example illustrates how to also configure an ET object:

<?xml version="1.0"?><configuration>

<object typeAlias="app-export-pipeline" checkExistSpEL="?[name=='my-search-results-xsl-gzip']" var-set="xslGzipPipeline">

<create><name>my-search-results-xsl-gzip</name>

</create><update>

<description>gzip envelope for xslexport</description><inputFormat>ROW_COLUMN</inputFormat><outputFormat>html</outputFormat><envelopeFormat>gz</envelopeFormat><includesContent>true</includesContent><content>Place your pipeline content here</content>

</update></object><object typeAlias="app-export-transformation" checkExistSpEL="?[name=='my-xslt-transformation']" var-set="htmlXsl">

<create><name>my-xslt-transformation</name>

</create><update>

<description>xslt transformation</description><type>XSLT</type><mainPath>my-search-results.xsl</mainPath>

</update></object><object typeAlias="app-export-configuration" checkExistSpEL="?[name=='my-gzip-html']" var-set="htmlGzip">

<create><name>my-gzip-html</name>

</create>

65

Page 66: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

<update><pipeline>#{xslGzipPipeline}</pipeline><description>gzip envelope for html export</description><exportType>ASYNCHRONOUS</exportType><transformations>

<portName>stylesheet</portName><transformation>#{htmlXsl}</transformation>

</transformations><transformations/>

</update></object>

</configuration>

The ET (Export Transformation) object allows the reference to a stylesheet to be bound to an EP(Export Pipeline) via an EC (Export Configuration). However, the stylesheet itself is a separateresource, which may depend on other resources, such as logos, css and/or other secondary files. Thisis why such a stylesheet, including all secondary resources, if any, needs to be uploaded to theET object as a single zip file. The ET object needs to identify the actual stylesheet in this zip viaits mainPath. The file identified here will be passed to the pipeline for stylesheet transformationpurposes. The stylesheet has access to all other resources in the zip using the exact same directorylayout. However, any resources that are also needed by the transformed result, need to be storedin the sub-directory named ’resources’. When exporting the search results using a stylesheettransformation, only this ’resources’ directory will be included in the exported result (using the exactsame directory layout). The stylesheet itself, as well as any other resources outside of this directory,are excluded from the export result.

The following illustrates the commands needed to upload such a zip file:

select --t app-export-transformation where="?[name=='my-xslt-transformation']"view --relationsfile-upload --from "path_to_a_zip_file" --rel zip

The following example illustrates how to delete any EC/EP/ET objects that you created:

select --t app-export-transformationdelete --id 1

To update such objects, use the update command as well as the update section in a configuration file.

At runtime, when Export is enabled, the user runs a search and can select one, many or all rows inthe search result, and then applies the desired export configuration. The user can have one or manyconfiguration to choose from (during search composition, the Developer will decide and configure).

The following Export configurations menu options are available from the main result panel:

When enabled, the following Export configurations menu options are available from the tabs ofthe side panel:

When enabled, the following Export configurations menu options are available from the inline panel:

66

Page 67: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

Configuration Export Through ANT

The main source of export "configurability" in InfoArchive is configuring transformations (i.e.,uploading ExportTransformations with XSLT + resources and using these in ExportConfiguratons,which are then bound to SearchConfigurations).

The following three export-related system object types can be either tenant- or application-scoped:• ExportPipeline

• ExportTransformation

• ExportConfiguration

You will want to add one at the tenant scope if it can be used in multiple applications. Otherwise, theapplication scope is more suitable.

In our samples, in tools/build-exports.xml, we build and upload four generic ExportPipelines(and ExportConfigurations) to InfoArchive at the tenant level. The xproc pipelines they containcan be found in the tools/tenants/infoarchive/exports directory.The ExportConfigurations do notrefer to any ExportTransformations (none of our tenant-scoped ExportPipelines actually transformSearchResults with an XSLT StyleSheet).

For an example of application-scoped export-related objects, see the PhoneCalls application. There,in the tools/applications/PhoneCalls/exports directory, the build-exports.xml ANT scriptbuilds, in addition, two application-scoped ExportPipelines, two ExportTransformations, and twoExportConfigurations (each binding one ExportPipeline and one ExportTransformation together) forthe PhoneCalls application, and uploads them to InfoArchive.

Later, when accessing the PhoneCalls application through the InfoArchive web application, theDeveloper can bind SearchCompositions to these (or the tenant-scoped) ExportConfigurations.

After doing so, the execution model is as follows. After running a search and selecting rows:

• Under the export-menu, the user will see the configured ExportConfigurations for the search thatthe Developer bound to it.

• Upon selecting one, the server will run the ExportConfiguration’s ExportPipeline with, if present,its ExportTransformation.

• And put the result in a location where it is downloadable.

For your application, include similar ANT scripts to upload export objects from similar sources(folders, pipeline XML files, transformation zip files etc.), and then use the InfoArchive webapplication to bind those ExportConfigurations to your SearchCompositions.

67

Page 68: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Creating a Search

68

Page 69: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Chapter 6Compliance

InfoArchive provides the ability to control how long data can be kept in the archive.

Retention policies define when data is eligible for disposition. Retention policies can also be usedfor your applications.

Data can be retained with the following methods:

• Retention managers can use InfoArchive’s interface to apply retention directly to applications,AIPs and tables.

• Retention can be applied automatically to content during ingestion (AIP).

• A job can be configured to apply retention to records in either tables or packages (AIPs).

Holds can be applied to prevent disposition. Data is never destroyed (disposed) until approval isgiven. There is a minor caveat that, if retention is not applied, it is possible that, if the application is intest, that the data can be destroyed. This is meant for cleaning up test systems. All retention andholds must be removed before deleting the application data (for applications in test mode).

Terminology

Term Description

Retention Policy Defines how long to keep the data.

Retained Set Created when a retention policy is applied to one or more items.Logical container, stores information about:

• Whether or not items in the set are aging together

• The type of items in set (i.e., application, AIP or table).

Hold Indicates that the item cannot be deleted or disposed.

Hold Set Logical container created when a hold is applied to one or moreitems. The type of items in the set is stored.

Purge Candidate List Created automatically by the system with theGeneratePurgeCandiateList job. Items in this list have hadall of their retention rules met.

69

Page 70: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Term Description

Qualification Process that determines when an item qualifies for disposition. It ispossible that a qualification date cannot be calculated. For example,for event-based retention, if the event has not been fulfilled, aqualification date will not be set.

Disposition A controlled mechanism for what to do when the retention periodhas elapsed. The only disposition strategy available is Destroy All.Disposition does not start until approval is given.

Retention PoliciesRetention prevents the disposition of an item. Retention management supports a controlleddisposition process for the Retention Manager. When retention is applied to an item, a retained setis created.

Each time a retention policy is applied, a different starting date can be used. Each set can specifya different date.

Retention policies can be changed but there are restrictions if the policy is in use. If the retentionpolicy is in use, the name, the type of retention or any conditions cannot be changed (i.e., you will notbe able to change a fixed-date retention policy to a duration retention policy).

Multiple retention policies can be applied to the same item. For more information, see ApplyingMultiple Retention Policies.

If no retention policy has been applied, and the application is in the Active state, InfoArchive does notallow the deletion of tables, AIPs and the application. If the application is in In Test state, data andthe application can be deleted via the interface and the REST API.

If a retention policy or hold is applied, in order to delete entire application (including data), the policyand hold need to be removed from an AIP, table or application.

The following table outlines which retention policy types can be applied via the different methods:

REST API:AIP, TableRecord, AIU orApplication

ManuallyThrough the UserInterface

Via a Job: Recordor AIU

Via Ingestion: SIPor Table

Duration Yes Yes Yes Yes

Fixed Date Yes Yes Yes Yes

Event Based Yes No Yes No

Mixed Yes No Yes No

70

Page 71: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Retention Policy Types

A data retention policy is a recognized and proven protocol within an organization to retaininformation for operational use while adhering to the laws and regulations concerning them. It is aset of guidelines that describes what data will be archived, how long it will be kept, and other factorsconcerning the retention of the data. Its objectives are to keep important information for futureuse or reference, to organize information so it can be searched and retrieved at a later date and todispose of information that is no longer needed.

Retention Policy Type Description

Fixed Date A holding will be retained until a particular date. For instance, youwant to retain a holding until January 1, 2020. Items that have theretention policy applied after the date are immediately eligible fordisposition.

Duration A holding will be retained for a duration of time after particularbase date. For instance, you want to retain a holding for two yearsafter the holding’s initial ingestion date.

Caution: Customers using Isilion should never reduce theduration of a retention policy once it has been set. Doing socauses the Requalification job to fail. The issue occurs becauseIsilion ensures that data is kept as long as the originally stateddate.

To resolve the issue, change the retention policy back to itsoriginal duration (increasing the duration or moving theretention to a later date).

If you are applying a duration retention policy, all of the recordsshould include the Data field. Otherwise only a subset of the recordsmay be protected. You will then have to refer to the logs to seewhich records are protected.

Caution: For customers using ECS cannot change retentionpolicies once they have been applied, or apply additionalpolicies to packages.

Event A record will be retained until a specific condition or conditions aremet. For instance, you want to retain a holding until the date theemployee leaves the company.

Can only be applied through the jobs.

Mixed Mode This type is a combination of the Duration and Event policytypes. For instance, you want to retain a holding for six years orimmediately after an auditor indicates that the content is no longerrequired.

71

Page 72: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Retention Policy Type Description

With Mixed Mode retention, if the event happens, the event path isused and it is not the shorter path.

Can only be applied through the jobs.

If you are applying a mixed retention policy, all of the recordsshould include the Data field. Otherwise only a subset of the recordsmay be protected. You will then have to refer to the logs to seewhich records are protected.

The cutoff is InfoArchive’s rounding mechanism, which is used to adjust the base date to start on aparticular month and day of the week. Cutoff can be applied to Duration, Event or Mixed retentionpolicy types. Cutoff is a generic records management term that happens at the end of the timeunit specified in the retention period.

A different cutoff can be used for mixed retention policies on either the chronological path (the eventdoes not happen) versus the event does happen.

Determining the Best Method to ApplyRetentionIf everything in the application needs to be disposed together, consider applying retention to theapplication. This is the simplest approach, which is appropriate for application decommissioning.If any items in the application (i.e., AIP or table) are put under hold, however, disposition will notproceed.

For SIP-based archiving, retention can be applied to each ingested SIP. The default retention policycan be defined on the application, on the holding (in the future) or on the ingested package (throughthe retention class). Only one retention policy would be applied, precedence is package -> holding >application. If nothing is defined, no retention policy is defined.

A retention policy can also be applied through the Apply Retention Policy To Records job. For moreinformation, see Working with Jobs: Manual vs Scheduled.

Applying Multiple Retention Policies

It is best to avoid applying multiple retention policies (single source). If multiple policies have beenapplied, the longest retention policy is used. For example, if a duration retention policy is appliedfor 5 years from now, and a fixed retention policy was applied to retain until 2050, the item will notbe eligible for disposition until 2050.

If a shorter retention policy is applied, a new date will not be pushed to the hardware (Isilon). ForIsilon storage, dates can only be pushed further into the future.

The ability to apply multiple policies is not supported by ECS.

72

Page 73: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

InfoArchive also allows you to apply retention to records, When viewing a record in the searchresults, the longest retention policy and source are shown. Sources can be the application,table/package or record.

Record-Based Retention

Retention can only be applied directly to records through jobs.Records are not put into purge lists,only applications, tables or packages. The disposition of packages and tables are delayed until allgoverning policies on records have been satisfied. A package will be included in the purge list if itsretention is met, but the system will keep skipping it until all records no longer have a hold, and anyretention applied to records has elapsed, which means for event retention policies, the Trigger Eventjob needs to run so a qualification date can be calculated to start aging.

Hold ManagementInfoArchive includes an out-of-the-box method to apply legal holds. Holds can be applied to AIPs,tables or an application. Once a hold is applied, a record cannot be deleted until the hold is removed.As illustrated in the following diagram, a hold set is created when a hold is applied:

After applying a hold, the apply hold information will not be available if the search is set to draft.

There may be a number of reasons why the information is not available:

• The search is not in the Ready state.

• The search set that was used has restrictions, and the user viewing the hold set is not in one ofthe permitted groups.

• For a table search, the search set did not set any of the xDB element values.

• For a table search, columns marked as encrypted, masked, or nested will not be shown. If nocolumns are valid, the message is displayed.

Workaround: The Developer can update the search, correct the problem, and mark the search asReady. There is no need to remove the hold to correct the problem.

The Purge ProcessWhen items are eligible for disposition, the Purge Candidate List Generation job determines whichitems are included in a purge candidate list. If holds are applied to items after the list is created,these items will not be disposed of. If approval is not given, the next time the Purge Candidate ListGeneration job runs, previous lists are marked cancelled and items are eligible to be placed into newlists. The job typically runs monthly, although it can also be run manually.

Purge candidate lists are always associated with an application. Lists are created per applicationand type of object.

73

Page 74: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Once the purge candidates list is generated, the retention manager reviews the contents. Theretention manager can choose to ignore, reject or approve the list:

• If the list is ignored, the items in the purge candidates list will be added to the new list the nexttime the Purge Candidate List Generation job runs.

• If a purge candidate list is rejected, items in the purge list will not be eligible for inclusion in newlists until the state of the rejected list is changed.

• If the list is approved, items in the purge candidates list will be disposed. If a hold is subsequentlyplaced on items in the list or put under a longer retention policy that hasn’t elapsed, those itemswill not be disposed, even though approval was given.

Click on a list to view the details of a specific purge list. The details panel contains the followinginformation, depending on the Status of the purge candidates list:• If the list was disposed, the panel indicates the number of items that were disposed or the numberof items skipped.

When disposing packages, sometimes items are skipped (i.e., if an additional retention policy wasapplied to the items.) The customer needs to run the Confirmation job and, the next time theDisposition job runs, the packages will be removed.

• If the list was disposed, the panel indicates the disposal date.

• If the list was cancelled, the panel indicates the cancellation date.

The purge candidates list also indicates the items with qualification dates in the past that couldhave been disposed sooner.

What Happens if a Purge Candidate List is Rejected?

Prior to disposing of items in a purge candidate list, approval has to be given to everything in thelist. If one item in the list cannot be disposed, none of the items in the list will be disposed. It is allor nothing.

When a purge candidate list is rejected, none of the items are eligible for disposition nor for inclusionin any new purge lists. The expectation is that the Retention Manager identifies which items causedthe list to be rejected and put individual holds on those items. Once done, the list should be markedas cancelled. The next time the purge candidate list generation job runs, the items without the holdare eligible for disposition.

Disposition of a SIP-Based Application

There are additional steps required to dispose of a package. After the DisposePurgeCandidateList jobruns, the packages will not be destroyed. The records in the package, however, will not be returned insearches. The package’s state will be updated to “Waiting for confirmation”. The Confirmation jobneeds to run, either manually or a scheduled run. After the confirmation job runs, the next time theDisposePurgeCandidateList job runs, the packages will be removed. It should be noted that:

• Putting a hold on the package delays its removal.

• Before the DisposePurgeCandidateList job runs, the dates can be checked on the package.

74

Page 75: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

To reclaim the storage for unstructured content, the Clean job must run.

Disposition of a Table-Based Application

When a table is disposed, the metadata for the table is also destroyed. Tables that have records underhold will not be eligible for inclusion in a purge candidate list.

It is important to note that a tables is not automatically put under retention. A retention policy can beapplied through the InfoArchive web application or via the Apply Retention Policy To Table job.

Retention Rules on the Disposition of AIPs,AIUs, Applications and RowsOne or more retention policies and holds can be applied to AIPs, AIUs, applications and rows withinan application.

A container (AIP or application) can have its disposition extended if one of its children cannot bedisposed.

Table Archiving – Application

For table archiving, the upper-level of a set of data is an application. A retention policy or a hold canbe applied at the application level or at the row level (individual object within the application).

Once the retention period of an application has expired, everything in application is to be disposed.However, applying a hold policy to items in an application prevents the disposition of the application.

The following table outlines various scenarios regarding the disposition of an application:

Scenario What Happens

Application with one retention policy appliedto it:

• Duration: 1 day

• Base Date: January 1, 2017

All rows have the same retention policy appliedto them and, therefore, have the same base date

The retention policy 1 is processed on January 1,2017. The application and all of its rows aredisposed, along with all supporting tables.

75

Page 76: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Scenario What Happens

Application with two retention policies appliedto it.

• Retention Policy 1:

— Duration: 1 day

— Base Date: January 1, 2017

• Retention Policy 2:

— Duration: 1 day

— Fixed Date: January 2, 2017

Only the application is put under retention. Therows are protected, and have the same basedate/fixed date, but it is the application that isgoverned.

Retention Policy 1 is processed on January 1,2017 but the application is not disposed.

Retention Policy 2 is processed on January 2,2017. he application and all of its rows aredisposed, along with all supporting tables.

Application with a retention policy applied to it.

The application also contains one row with aretention policy applied to it.

• Retention Policy 1 (applied to application):

— Duration: 1 day

— Base Date: January 1, 2017

• Retention Policy 2 (applied to row):

— Fixed Date: January 2, 2017

On January 1, 2017. nothing is disposed (as theapplication is either completely disposed or not).Only on January 2, after the purge list includesthe application, will the application be disposed.

Application with a retention policy and a holdapplied to it.

• Retention Policy 1:

— Duration: 1 day

— Base Date: January 1, 2017

• Hold Policy 1

Retention Policy 1 is not processed on January 1,2017 because of Hold Policy 1.

When Hold Policy 1 is removed from theapplication, the application will be processedand disposed.

Note: An approval process is required prior todisposition.

Application with a retention policy and a holdapplied to one row.

Retention Policy 1 is processed on January 1,2017. Retention Policy 1 is then applied to the

76

Page 77: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Scenario What Happens

• Retention Policy 1 (applied to application):

— Duration: 1 day

— Base Date: January 1, 2017

• Hold Policy 1 (applied to a row):

— All rows have Retention Policy 1 with basedate of January 1, 2017

— One row has Hold Policy 1 applied to it.

new application and Hold Policy 1 is appliedto the row. The hold policy is removed fromthe row in original and original the originalapplication is disposed.

A retention policy needs to be applied to thenew application so that is it protected andsubsequently processed. When the hold policyis removed, the new application is processedand disposed.

SIP Archiving

The main unit of data in a SIP archive is an AIP that contains AIUs. Retention and hold policies canbe applied to AIPs and AIUs. Below are some scenarios for SIP based archiving:

Scenario What Happens

An AIP with a retention policy applied to it:

• Duration: 1 day

• Base Date: January 1, 2017

All AIUs have the same retention policy appliedto it and, therefore, have the same base date.

The retention policy is processed on January 1,2017 and the AIP is disposed along with theAIUs.

An AIP with two retention policies applied to it.

• Retention Policy 1:

— Duration: 1 day

— Base Date: January 1, 2017

• Retention Policy 2:

— Fixed Date January 2, 2017

Nothing happens on January 1, 2017. OnJanuary 2, 2017, the AIP is eligible to be includedin a purge list.

An AIP with a retention policy and hold appliedagainst it.

The hold policy prevents the AIP from beingincluded in a purge list. Once the hold policy is

77

Page 78: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Scenario What Happens

• Retention Policy 1 (applied to AIP):

— Duration: 1 day

— Base Date: January 1, 2017

• Hold Policy 1 (applied to AIP)

removed, the AIP is eligible to be included in apurge list.

AIP that has a retention policy applied to it thatcontains an AIU with a hold applied to it.

• Retention Policy 1 (applied to AIP):

— Duration: 1 day

— Base Date: January 1, 2017

• Hold Policy 1 (applied to AIU)

The AIP is treated as if it has a hold policyapplied to it. Only if the AIP and none of its AIUhave a hold policy applied to them will the AIPbe included in a purge list.

It is possible to apply a retention policy to an AIU. Applying retention to an AIU causes the parentpackage to not be eligible for disposition until the retention policies of all AIUs are satisfied. Thepackage will not be eligible for inclusion in a purge list. Records for this release will not show upin a purge list.

Using the Retention Sets TabWhen a retention policy is applied to records, the records are tracked as a retention set. The RetentionSets tab shows the retention sets that are associated with the selected application.

Each retention set is displayed in a table that contains the following information:

Column Description

Retention Set Name The name of the retention set.

Item Type Indicates the item that comprises the retention set (i.e.,application, package, table, etc.).

Created Date The date the retention set was created.

Associated Policy Indicates the retention policy applied to the retention set.

Aging Strategy Indicates the type of retention policy. For more information, seeRetention Policy Types.

Items Indicates the number of items that are contained in the retentionset.

Qualification Date The date the retention qualifies for disposition. This value canbe blank if using apply retention jobs.

78

Page 79: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Viewing Items in a Retained Set

1. Select the application in which the retention set is stored.

2. In the Retained Sets tab, click for the retention set you want to view. The following information isdisplayed:• The name of each item contained in the retention set,

• The state of each item,

• Whether a hold has been applied against each item, and

• The number of records contained in the item. The information that is displayed dependson the type of item. For example:

— If retention was applied directly to the application, information about the application isdisplayed.

— If retention was applied to a package, the state of the package is displayed.

— If retention was applied to a table, the number of records in the table is displayed.

Using the Hold Sets TabWhen a hold is applied to records, the records are tracked as a hold set. The Hold Sets tab shows thehold sets that are associated with the selected application.

Each hold set is displayed in a table that contains the following information:

Column Description

Hold Set Name The name of the hold set.

Item Type Indicates the item that comprises the hold set (i.e., application,package, records, table, etc.).

Records would be the value if a hold policy is applied to searchresults.

Created Date The date the hold set was created.

Associated Hold Indicates the name of the hold applied to the hold set.

Hold Type Indicates the type of hold applied to the hold set. A hold typecan be:

• Legal

• Permanent

Items Indicates the number of items that are contained in the hold set.

Review Date The date the hold set is to be reviewed to determine if it eligiblefor disposition.

79

Page 80: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Viewing Items in a Hold Set

1. Select the application in which the hold set is stored.

2. In the Hold Sets tab, click for the hold set you want to view. The information displayed dependson the Type of hold set:

Item Type Description

Application The application’s properties are displayed.

Package A list of packages is displayed.

Table A list of tables is displayed.

Records A list of records is displayed. The records displayed arebased on the results of a primary search configuration. Theresults of a nested search configuration are not displayed.

Removing an Item from a Hold Set

This procedure allows the user to remove a hold policy from records. For example, if you executed asearch and applied the hold policy to 15 records, use the following procedure to remove the holdfrom all 15 records.

If all of the items are removed from the hold set, then the hold set will be removed.

Note: There is also the ability to remove selected items from the hold set (i.e., three of the 15 recordsno longer need to be held).

1. Select the application.

2. In the Hold Sets tab, click .

3. The records under the hold set are displayed. Select the record or records you want removedfrom the hold set.

4. Click Remove Hold.

5. A pop-up is displayed that indicates the name of the hold set, the number of records beingremoved from the hold set. Enter the reason why the hold set is being removed from the recordsand click Remove.

Because this is an asynchronous operation, check the Background Results tab for the status.

Troubleshooting

If a hold is placed on SIP or table data, users cannot search for information in the hold set if the searchis not in a Ready state. If a hold is placed on data, the Developer must ensure that any applicablesearches are in the Ready state. There is no need to remove the hold to correct the problem.

80

Page 81: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Using the Purge Lists TabThe Purge Lists tab allows the Retention Manager to:

• Review purge lists, which contain the items eligible for disposition.

• Approve or reject the disposition of the items contained in a purge list.

• Cancel a previously ordered approval or rejection of a purge list.

• Reject a purge (disposition) of items contained in a purge list.

• Export the details contained in a purge list.

For more information, see the Purge Process.

The following table describes the fields for a purge list:

Column Description

Purge List Name The name of the purge list.

Type Indicates the item that is contained in the purge list (i.e.,application, package, etc.).

Retention Policy Indicates the name of the retention policy that was applied tothe retained set.

Items Indicates the number of items in the purge list.

Created Date Indicates the date the purge list was created.

Status Status of the purge candidate list:• Under Review: The RetentionManager is currently reviewingthe list to determine if the contents are eligible for disposition.

• Approved: The items contained in the list are eligible fordisposition.

• Rejected: The list was rejected by the Retention Manager. Thefollowing actions can be performed on a purge list with the’Rejected’ status:

— Cancel Rejection: Takes the state back to ’Under Review’.

— Cancel Purge: Takes the state back to ’Cancelled’. If thepurge is not cancelled, the items in the purge list will notbe eligible for new purge lists.

• Disposed: The items contained in the list were disposed.

• Cancelled: The list was rejected by the Retention Manager.The Administrator then ran the Purge Candidate ListGeneration job, which cancelled the previous list.

81

Page 82: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Performing Actions to a Purge List

The available actions a Retention Manager is able to perform on a purge list depends on the Statusof the list:

Status Available Actions

Under Review The Retention Manager is able to approve or reject the purgecandidates list.

Approved The Retention Manager is able to cancel the approval of the purgecandidates list.

Rejected The Retention Manager is able to perform the following actions:

• Cancel the Reject: Moves the purge list back to under review sothe customer can the list.

• Cancel Purge: Even if a purge is cancelled, the items will still beeligible for a new purge list. If there are items in the cancelled purgelist that cannot be disposed, a hold policy should be applied to theitems so they are not included in the next purge list.

Cancelled The Retention Manager is not able to perform any actions on the purgecandidates list.

Disposed The Retention Manager is not able to perform any actions on the purgecandidates list.

To perform any of the above actions:

1. Select the purge candidate list in the Purge Lists tab.

2. Click and select the action you want to perform on the list.

Using the Application Info TabThe Application Info tab allows the Retention Manager to apply or remove retention policies andholds from an application.

Other users, such as the Administrator or Designer, can access the Application Info tab but cannotperform any actions other than reviewing the details of the selected application.

Applying a Retention Policy to an Application

1. Select the application the retention policy is being applied to.

2. On the Application Info tab, click Apply Retention Policy.

3. Select the retention policy being applied to the application and click Next.

82

Page 83: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

4. Review the retention policy details to verify that it is the correct policy to apply to the application.If applicable, enter a new Base Date. Click Next.

5. Enter the following information:a. Retention Set Name: Enter a unique name for the policy.

b. Enter a Description for the policy.

c. Click Next.

6. Review the information you have entered. When satisfied that the information is correct, clickFinish.

The retention set applied to the application is listed in the Retention Sets tab.

Removing Retention from an Application

1. Select the application.

2. In the Application Info tab, click for the retention set being removed.

3. When prompted to verify that you want to remove the selected retention set, click Remove.

Applying a Hold to an Application

1. Select the application you are applying a retention policy to.

2. On the Application Info tab, click Apply Hold.

3. Select the retention policy you want applied to the application and click Next.

4. Enter the following information:a. Hold Set Name: Enter a unique name for the hold.

b. Enter a Description for the hold.

c. Click Next.

d. Review the information you have entered. When satisfied that the information is correct,click Finish.

This operation is asynchronous. Only if the operation is successful will you set the new entryin the hold sets.

The hold you applied to the application or holding is listed in the Hold Sets tab.

Removing a Hold Set from an Application

1. Select the application.

2. In the Application Info tab, click for the hold set being removed.

3. When prompted to verify that you want to remove the selected hold set, click Remove.

83

Page 84: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

This operation is not asychronous. It can, however, also be performed from the Retention Sets tabwhere the operation would be synchronous. For more information, see Using the Retention Sets Tab.

Using the Packages TabThe Packages tab allows users to perform the following actions:

• View a list of the available AIPs in an application. Use the four filters to filter by:

— Holding

— AIP phase

— Reception date

— Errors

• The Retention Manager can apply a retention policy or hold to an AIP;

• Reject or invalidate an AIP.

Each package is displayed in a table that contains the following information:

Column Description

Name The name of the AIP. Click the link to view thedetails of the package.

If the package is an aggregate of multiple AIPs,click the link to view the AIPs that comprise theaggregate.

A menu that allows a user to perform thefollowing actions on a selected AIP:• Apply retention

• Apply hold

• Reject package

• Invalidate package

• Request the closing of the pooled library

The actions displayed in the menu depend onyour user role as well as the State of the package.For more information, see Applying Actions toa Package.

84

Page 85: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Column Description

Phase Displays the current phase of the AIP, whichinclude:• Reception

• Waiting Ingestion

• Ingestion

• Waiting Commit

• Completed

• Reject

• Invalid

• Purge

• Aggregate

Holding Indicates the name of the holding the SIP wasingested into.

Reception Start Date Indicates the reception date of the SIP.

Retention Indicates whether a retention policy has beenapplied to the AIP.

Hold Indicates whether a hold has been applied to theAIP.

Records Indicates the number of records in the AIP.

View additional information by clicking on a package name .The information tab contains the customproperties of the selected AIP, package content (sip.xml, logs files, ci.containter file etc.).

Retention and ECS Storage

You should never remove a retention policy nor add additional retention policies to a package or AIPif you are using ECS storage. This only applies for SIP archiving, however, as the table archivingprocess does not push the date to the hardware.

Applying Actions to a Package

The actions that can be applied to an AIP depend on the State of the AIP:

85

Page 86: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

State Available Actions

Reception The following actions can be performed on the AIP:

• Invalidate

• Apply Retention

• Apply Hold

Waiting Ingestion The following actions can be performed on the AIP:

• Reject

• Invalidate

• Apply Retention

• Apply Hold

Ingestion The following actions can be performed on the AIP:

• Invalidate

• Apply Retention

• Apply Hold

Waiting Commit The following actions can be performed on the AIP:

• Reject

• Invalidate

• Apply Retention

• Apply Hold

Completed The following actions can be performed on the AIP:

• Reject

• Invalidate

• Apply Retention

• Apply Hold

Purge The following actions can be performed on the AIP:

• Apply Retention

• Apply Hold

86

Page 87: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

State Available Actions

Reject The following actions can be performed on the AIP:

• Apply Retention

• Apply Hold

Invalid The following actions can be performed on the AIP:

• Apply Retention

• Apply Hold

Applying a Retention Policy to an AIP1. Select the AIP the retention policy is being applied to.

2. Click and select Apply retention.

3. Select the retention policy you want applied to the application and click Next.

4. Review the retention policy details to verify that it is the correct policy to apply to the application.The fields that are displayed depend on the Aging Strategy of the retention policy. For moreinformation, see Creating a Retention Policy.

5. Enter the following information and click Next:• Retention Set Name: Enter a unique name for the policy.

• Enter a Description for the policy.

6. Review the information you have entered. When satisfied that the information is correct, clickFinish.

The AIP now indicates that there is a retention policy applied to it.

Applying a Hold to an AIP

1. Select the AIP the hold is being applied to.

2. Click and select Apply hold.

3. Select the hold you want applied to the application and click Next.

4. Enter the following information and click Next:• Hold Set Name: Enter a unique name for the hold.

• Enter a Description for the hold.

5. Review the information you have entered. When satisfied that the information is correct, clickFinish.

This is an asynchronous operation. The AIP will not indicate that there is a hold policy applied toit until the order completes.

87

Page 88: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Rejecting or Invalidating an AIP

When you reject or invalidate an AIP, it cannot be returned as a search result. A rejection andinvalidation can be applied to an AIP at any time, but the implications are different if these actionsare performed before or after the AIP is committed:• Reject an AIP when you want to invalidate all AIPs that belong to the same collection. Whenyou reject an AIP, you cannot resubmit an AIP with the same DSS as long there is one or morerejected AIPs in the repository.

• Invalidate an AIP if the wrong SIP was submitted and you want to resubmit the correct SIP withthe same identifier.

When an AIP is marked to be rejected or invalidated, the process is performed by a dedicated jobasynchronously. If the AIP has not been committed, the AIP is immediately destroyed. If the AIPhas been committed, the AIP is purged by the retention service.

You are able to reject an AIP if the AIP was part of a DSS with more than one other AIP. The AIP mustalso be in one of the following phases:• Waiting Ingestion

• Waiting Commit

• Completed

You are able to invalidate an AIP if the returnCode is ’OK’. The AIP must also be in one of thefollowing phases:• Waiting Ingestion

• Waiting Commit

• Completed

To reject or invalidate an AIP:1. Select the and access the Packages tab.

2. Select the AIP that is being rejected or invalidated.

3. Click and select either:• Reject Package or

• Invalidate Package

4. Select the Reason why the AIP is being rejected or invalidated.

5. If desired, enter any pertinent information in the Comment field.

6. Click Reject or Invalidate, depending on your selection in step 3.

Using the Retention Policies TabThe Retention Policies tab allows the Retention Manager to:

• View and edit the details of a retention policy.

• Create and delete a retention policy .

88

Page 89: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Each retention policy is displayed in a table that contains the following information:

Column Description

Policy Name Indicates the name of the retention policy.

Policy Category Indicates the category associated with the policy.

Aging Strategy Indicates the retention type of the retention policy. For moreinformation, see Retention Policy Types

Approved Date Indicates the name of the person or organization that approved thepolicy (optional).

Policy Approver Indicates the name of the person or organization that approved thepolicy (optional).

In Use Indicates whether the retention policy has been applied.

An information tab contains the custom properties of a selected retention policy.

Creating a Retention Policy1. On the Compliance > Retention Policies tab, click +.

2. Enter the following information:

Field Description

Policy Name Enter a unique name for the policy.

Description Enter a description for the policy.

Policy Category Enter or select a policy category (i.e., you may want to have a categoryfor policies applicable to e-mail messages and another category forpolicies applicable to voicemail messages).

Aging Strategy Select the retention type for the policy being created. For moreinformation, see Retention Policy Types. If you select:

• Fixed Date, enter the eligible disposal date the policy will apply toa holding.

• Duration, indicate how long the policy will retain a holding.Specify the duration in years, months, weeks, or days. Also indicatea cutoff date (i.e., if you want to retain a holding for an entire yearbut want disposition to occur at the end of the company’s fiscalyear). If you select an annual cutoff for the policy, specify the cutoffday and month.

Caution: Customers using Isilion should never reduce theduration of a retention policy once it has been set. Doing socauses the Requalification job to fail. The issue occurs becauseIsilion ensures that data is kept as long as the originally stateddate.

89

Page 90: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Field Description

To resolve the issue, change the retention policy back to itsoriginal duration (increasing the duration or moving theretention to a later date).

• Event, select or enter the condition that has to be met before aholding can be disposed. You can also indicate a cutoff date (i.e., ifyou want to retain a holding for an entire year but want dispositionto occur at the end of the company’s fiscal year). If you select anannual cutoff for the policy, specify the cutoff day and month. Formore information, see Event-Based Retention.

• Mixed, select or enter the condition that has to be met and thenindicate how long the policy will retain a holding. You can alsoindicate a cutoff date (i.e., if you want to retain a holding foran entire year but want disposition to occur at the end of thecompany’s fiscal year). If you select an annual cutoff for the policy,specify the cutoff day and month.

A date must be within the following range: 1000-01-01 –2999-12-31.

Approved Date Enter the date that the Policy Approver will manually approve thedisposal of holdings associated with the policy.

Policy Approver Enter the name of the person or organization that approved the policy(optional).

Notes Enter any relevant policy information you want to communicate.

Disable Disposition Click to ensure that items protected by the retention policy will notappear on a purge list or be disposed.

3. Click Create.

Editing a Retention Policy

When editing a retention policy that is in use, you can change:• The dates included in a During retention policy type,

• A cutoff date, and

• The name of the policy.

You cannot change a retention policy’s type if the policy is in use (i.e., you can’t change a fixed datepolicy to an event-based policy).

1. In the Compliance > Retention Policies tab, select the policy being edited and click

2. Edit the fields, as desired. For further field information, see Creating a Retention Policy.

3. Click Save.

90

Page 91: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Deleting a Retention Policy

You cannot remove a retention policy that is currently in use. To determine if a retention policy isin use:

• The retention policy is applied to an item

• An application refers to the retention policy through the Default Retention Policy.

• A retention class (defined with holdings) refers to the retention policy. For more information,see defaultRetentionClass.

1. In the Compliance > Retention Policies tab, click for the retention policy being deleted.

2. When prompted to verify that you want to delete the selected retention policy, click Delete.

Event-Based Retention

Event-based retention can only be applied through the Apply Retention to Records job.

The concept of event-based retention revolves around the principle that retention aging does notcommence until an event happens. Usually, however, an event has a context associated to it. Forexample, a retention policy may want to protect employee data associated but not start the retentionaging until the employee leaves the organization.

To achieve this, InfoArchive stores events separately from a retention application. . This decouplingallows the ability for multiple retention policies to share the same event and not require the customerto set an event date for each application. This means that if a retention policy is applied, and the eventhas already been fulfilled, retention aging immediately commences.

The following steps outline the procedure:

1. The Retention Manager defines a retention policy. An event is also defined and given a name.When a retention policy is created, the Retention Manager does not need to know what thecontext will be.

2. A process identifies some content that must be protected by the retention policy. The processgroups the content and makes separate calls to apply the retention policy using a differentcontext (i.e., the Employee ID).

If additional information becomes available for the same employee, the information can be addedto the existing retained set. Alternatively, if the Retention Manager prefers that the information toage independently, the same event (and context) can be reused so that the event only needs to beset once for that employee.

3. When applying the retention policy, the system verifies if the event (and the requested context)exists. If the event already exists, the event does not have to be created. This is important becauseit could be possible that an event is fulfilled even before the data is ingested into the archive.

91

Page 92: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Using the Holds TabThe Holds tab allows the Retention Manager to:

• View and edit the details of a hold.

• Create and delete a hold.

The holds managed here can be applied to applications, packages and search results by the RetentionManager. When a hold is applied, it overrides retention policies for a specified time.

Holds prevent a record from being purged, even if it is included in a retention event. A hold can beplaced directly on a record or group of records for any reason (i.e., legal process). A record can beassociated with more than one hold at a time. If one hold is removed, the remaining holds preventthe record from being purged.

Each hold is displayed in a table that contains the following information:

Column Description

Hold Name Indicates the name of the hold.

Description Provides a description of the hold.

Type Indicates whether the hold is Legal or Permanent.

Hold Approver Indicates the name of the person who will approve the hold.

Review Date Indicates the date that the hold will need to be reviewed.

Requested By Indicates the name of the person who requested the hold being created.

In Use Indicates whether the hold has been applied.

An information tab contains the custom properties of a selected hold policy.

Creating a Hold

1. On the Compliance > Holds tab, click +.

2. Enter the following information:

Field Description

Hold Name Enter a unique name for the hold.

Description Enter a description for the hold.

Type Indicate whether the hold is Legal or Permanent.

Hold Matters Enter the legal reason behind the creation of the hold.

This field is only displayed if the type of hold being created is ‘Legal’.

Approved Date Indicates the date the hold was approved.

Hold Approver Enter the name of the person who will approve the hold.

Requested By Enter the name of the person who requested the hold being created.

92

Page 93: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

Field Description

Review Date Enter the date that the hold will need to be reviewed.

Notes Enter any relevant hold information you want to communicate.

3. Click Create.

Editing a Hold

1. In the Compliance > Holds tab, select the policy being edited and click .

2. Edit the fields, as desired. For further field information, see Creating a Hold.

3. Click Save.

Deleting a Hold

1. In the Compliance > Holds tab, click for the hold being deleted.

2. When prompted to verify that you want to delete the selected hold, click Delete.

93

Page 94: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Compliance

94

Page 95: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Chapter 7Authentication and AuthorizationNote: The Gateway and InfoArchive web application application.properties file settings havebeen moved to the application.yml file. They are equivalent to the original settings, except forsome cleanup of unused settings, such as ROLES.

AuthenticationAuthentication means establishing that the user (entity) accessing the system is, in fact, the entity itclaims to be by virtue of providing the required credentials. Once authenticated, the configured rulesof authorization apply based on the membership in groups and mapped roles.

Based on the configuration. groups are mapped to a set of roles. The roles are defined by InfoArchive.Once the user logs into InfoArchive, the user, the groups the user is a member of and the mappedroles are included in the security context.

Many companies use an LDAP server to manage users and user group memberships. InfoArchivecustomers are able to configure an LDAP server for authentication.

Active Directory Integration

Many companies use Active Directory to manage users and user group memberships. InfoArchivecustomers can configure the Active Directory server for authentication. This is activated by the Springprofile - AUTHENTICATION_ACTIVE_DIRECTORY .

A running Active Directory Server is assumed to be available.

The following is an example of an internal Active Directory server:HostName/IP: ad2008.iigads.com/10.31.70.140Remote Desktop Access: username/Password is Administrator/Password@123Binding Name: cn=Administrator,cn=users,dc=iigads,dc=comBinding Password: Password@123UserObjectclass: userUserSearchBase: ou=Users,ou=infoarchive,dc=iigads,dc=comFilter: cn=*GroupObjectClass: groupGroupSearchBase: ou=Groups,ou=infoarchive,dc=iigads,dc=comFilter: cn=*Certificate Location: C:\Users\Administrator\Desktop\certnew.cer

User Roles and User Groups

InfoArchive can integrate with customer’s SSO.

95

Page 96: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Authentication and Authorization

A user role represents a collection of permissions that InfoArchive defines. For instance, the end userrole allows a user to:

• Complete a search form,

• Execute a search, and

• View search results.

The Administrator defines the user groups. Groups allow easy management of customer-specificpermissions on a per instance of object+action basis. For example, search form F1 may be accessed byGROUP 1. All end users that are members of GROUP 1 can access search form F1. Essentially, groupsare lists of users (no permissions) while roles are lists of permissions (no users).

A user is a principal or the identity of a person, and has an associated single password.

AuthorizationAuthorization indicates what actions on which items the user can perform within InfoArchive. Itmeans what menus and tabs the user can see within the InfoArchive web application. InfoArchiveuses groups and roles to determine what the user can do and see.

When a user logs into InfoArchive, the groups the user is a member of are stored as part of the user’ssecurity context. These groups are used to determine a user’s functionality that she or he can perform.Roles determine what actions the user can perform.

In InfoArchive, the purpose of authorization is to provide or deny access to:

• The Dashboard when accessing entering InfoArchive. If the user does not have the right to accessthe Dashboard, a list of applications that the user has access to is provided.

• Individual applications. Users are granted the right to view specific applications and all thecontents of those applications.

• Search Forms. Within applications, users are granted the right to use search forms for accessingdata within the application.

• Result Forms. Within applications, users can be restricted to view certain fields within searchresults.

• Functionality within the user interface (i.e., restriction to the Compliance tab in the main screen).

• Functionality within the REST API (i.e., restriction on the Retention Policy API to fetch a list ofpolicies).

Using the Groups TabThe Groups tab allows the Developer and Administrator to administer which groups can accessspecific InfoArchive functionality. For instance, a user in the administrator group can be permitted toperform compliance tasks (i.e., create a retention policy).

Click to learn what actions can be performed by each user role.

96

Page 97: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Authentication and Authorization

A check mark indicates that a group can perform the actions of a specific user role. Click a box toadd or remove a check mark.

The Options menu allows the Developer or Administrator to toggle between:

• Sync with Configured Authentication System: If selected, the system requests CAS to update thegroups list. For more information, see Authentication and Authorization.

• Show Deleted Groups: Allows the user to view the deleted groups.

• Show All Groups: Allows the user to view all available groups.

Using the Permissions TabThe Permissions tab allows the Developer or Administrator to allow user groups to access specificapplications. Toggle to allow access by application or by group. You can display all groups or onlygroups that can access an application.

The permission model within InfoArchive has two parts. A user is a member of a group(s). Thegroups are mapped with roles, which determine what actions the user can perform on items in thearchive. As well, addition permissions can be associated to applications to restrict access to specificgroups. By default, if there are no groups associated to the application, all users have the ability to dotheir actions (based on role permission) to the application.

The permission tab shows which groups have access to the application. If the Administrator has notrestricted access to specific groups, the filter ’Show Only Groups with Access’ only shows groupswhere the Administrator and restricted access to. If this list is empty, the Administrator has notrestricted access and all users can access the application.

A check mark indicates that access is permitted. Click a box to add or remove a check mark.

97

Page 98: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Authentication and Authorization

98

Page 99: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Chapter 8Administration

InfoArchive Administration entails the following tasks:

• Creating an application

• Registering a federation

• Creating a database

• Adding a Storage System

Using the Application PageThe Applications page includes a list of the available applications. From here, an End User canselect an application to execute a record search and a Developer can create an application or edit anexisting application.

Developers can quickly scan or search for a specific application. The search functionality searches thenames and descriptions of each application and returns the results.

Click to view the details of a particular application.

Creating an Application

Prior to creating a search form or any back-end artifacts, the Developer needs to create an application.An unfinished application can be saved and completed at a later time.

1. In the Applications tab, click Create Application.

2. Enter the following information:

Field Description

Application Name Enter a name that identifies the application(up to 18 alphanumeric characters).

Description Enter a description of the application.

Category Enter a category or select a previously usedcategory.

99

Page 100: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Field Description

Default Retention Policy Select the default retention policy for theapplication. The selected retention policy willbe applied to any SIP that is ingested andstored in the application.

The Default Retention Policy is only used ifthe default retention class is not specified onthe holding and the SIP does not specify aretention class.

Application Type Select one of the following:• Application Archiving

• Active Archiving

Archive Type The available options depends on theApplication Type you selected.

If the Application Type is ’ApplicationDecommission’, select one of the following:• ’Based on Table Schema’ for a table archive.

• ’Based on packages’ for a SIP archive.

If the Application Type is ’Active Archiving’,select Based on packages.

3. Click Create.The application is now listed on the Application page.

When an application is created, it has a Status of ’In Test’, which allows a customer to test theapplication, typically, with fake data.

Editing an Application

The Developer and Administrator have the ability to edit an application:

1. From the Applications Listing page, click .

2. Edit the application information. For more information about the fields available, see Creatingan Application.

The following fields will be disabled if the application has ingested data or includes a definedsearch configuration:• Default Retention

• Policy Application

• Type Archive Type

3. If required, edit the Status of the application. When an application is created, it has a Status of ’InTest’, which allows a customer to test the application, typically, with fake data. The Developer

100

Page 101: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

can only delete all of the data from an application if the application’s Status is ’In Test’. If theapplication’s Status is ’Active’, data can only be disposed through the disposition process. Formore information, see the Purge Process.

a. When prompted to confirm the status change, click Save.

4. Click Save.

Deleting an Application

The Developer has the ability to delete an application as long as the application:

• Has not ingested data;

• Does not include a defined search configuration; and

• Does not have a retention or hold policy applied to it. The policy must be removed from theapplication before the Developer can successfully delete the application.

The Administrator can delete an application and its ingested data as long as the application’s Statusis ’In Test’.

1. From the Applications Listing page, click and select Delete Application.

2. When prompted to confirm that you want to delete the application, click Delete.

The application is no longer listed on the Application page.

Deleting Data from an Application

This operation requires that the Retention Manager remove all holds and retention policies applied toany items in the application (or applied directly on the application).

The Developer can only delete all of the data from an application if the application’s Status is ’In Test’.If the application’s Status is ’Active’, data can only be disposed through the disposition process. Formore information, see the Purge Process. Any retention or hold policies need to be removed fromthe data before deleting it from the application.

1. Click and select Delete Data.

2. When prompted to confirm that you want to delete all of the ingested data for application,click Delete.

Working with Jobs: List of Available JobsThe section provides a summary of the jobs that are included with InfoArchive.

101

Page 102: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Job Name Schedule Description

Apply Retention PolicyTo Records

Manual Provides ability to apply all types of policies torecords whereby each record ages individually.

Needs to be scoped to an application. The jobalso requires an XML file to be created on theInfoArchive server to indicate the search criteria.Records that already have the policy will nothave the policy applied again.

Archive Audits Daily Exports in memory audit information intoSIPs so that they can be searched in the Auditapplication.

Requires the audit application to be installed.After this job runs, the audits are purged soREST calls to fetch the audits will not return thearchived audits.

BuildIndex Do not run Internal job that builds indices.

This job is run by the system automatically andis not to be scheduled or run manually.

Clean Every 5 minutes Frees up resource, such as orders, search results,and AIPs.

It is important to have this job scheduled.

Clean up PurgeCandidate Lists

Weekly Cleans up purge candidate lists that have beendisposed or cancelled.

Close Every 5 minutes Closes eligible XDB libraries and aggregates.

It is important to have this job scheduled.

Commit Manual Commits packages. This job must be scheduledevery five minutes if a different SIP of the sameDSS is ingested.

Confirmation Manual Confirms that some events on packages occurred(Receive, Storage, Purge, Invalidation).

DisposePurge-CandidateList

Weekly Executes the disposition of approved purge lists.

It is important to have this job scheduled. Oncean application is active, this job is the only wayto remove content from archive.

102

Page 103: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Job Name Schedule Description

GeneratePurge-CandiateList

Weekly Creates purge lists for items that are eligible fordisposition, meaning that the retention periodhas expired and the items are not under hold.

It is important to ensure that theDispositionPurgeCandidateList runs on a similarschedule. If the GeneratePurgeCandidateList isrun before the disposition purge candidate listjob, approved and under review purge lists willbe marked cancelled.

Invalidation Manual Invalidates packages marked as invalid.

Supports both system and application scoping.

RefreshMetrics Daily Updates the metrics for the Dashboard.

Must be scoped to the system. Does not supportapplication scoping.

Remove Policy Manual Removes the named retention policy from items.Can be limited to a type of retention policy.

Only one retention policy can be specified,case-sensitive.

Requalification Manual If retention policies are changed and the changesare retroactive, this job updates the retentioninformation for all managed objects.

Some storage systems may not support updatingqualification dates.

TableApplyRetention Manual Provides the ability to apply retention to tablerecords.

Requires parameters to be set and must bescoped to an application.

Table Data VolumeUpdate

Manual Updates the table character count for theDashboard. Only run if upgrading fromInfoArchive 4.0.

Trigger Event Policy Manual Used to fulfil event dates for records using eventor mixed retention policies.

Requires an XML file to be placed on theInfoArchive Server to indicate the event, eventcontext, and date that the event happened (orwill happen).

103

Page 104: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Using the Apply Retention Policy to Records Job

The Apply Retention Policy to Records job applies a retention policy to AIUs or table rows. The jobuses a pre-configured search to determine which AIUs or table rows needs to have retention applied.The job allows for criteria to be specified to pinpoint the results. Each result will have the configuredretention policy applied to each AIU/table row.

You must select the application to run the job against. Searches are specific to applications so you canonly select one application at a time. This job has to be application scoped.

The following parameters can be configured:

Parameter Description

searchName This is the name of the search to execute. The search must bedefined in the application and must be ready.

searchSet Search composition within the search to use.

searchCriteriaFile This is a path to a criteria file that contains what you want tonarrow the results to. The following is an example of a SIPsearch:

<data><criterion><name>CustomerID<

/name><operator>EQUAL</operator><value>000103<

/value><value>391</value></criterion></data>

Example of a table search: <data><customerID>16<

/customerID></data>

The location is relative to where the server is deployed. Ifmultiple InfoArchive servers are installed, it is recommendedthat you use a network location (versus a local path).

retentionPolicyName All four types of retention policies are supported. Dependingon the type of policy, other properties might have to be set.

contextType Used for Event policies. There are two possible values:

• Attribute: The job gets the context from an attribute in thedata. The attribute is set in the context field.

• Fixed: The job uses the value in this field as the context forthe events.

For example, if you want to have all records for an employeeage for 5 years after she/he has left the company, make thecontext the employee ID, since it will be the same for alldocuments. InfoArchive groups all documents with the samecontext together. Trigger the event using this context. Allrecords associated with this context (i.e., employee number)will be eligible for disposition in 5 years).

context The value is dependant on the ContextType. Either theattribute (its value) to be used as the context or a value enteredinto this field will be used as a context for all records that arereturned by the search.

104

Page 105: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Parameter Description

retentionDateAttribute This value is used for Duration and the Duration portion ofMixed retention policies. Duration policies that are applied torecords need to have a date to calculate the age of the record.The date will be taken from the attribute specified in this field.If the attribute does not have a date, the record will be skipped,meaning a policy will not be applied

trigger Event policies can be triggered using the records data. Thereare two possible values for this property:

• True: The job will attempt to trigger the event policy thatwas applied to the records.

• False: Set the value to false if not using either event ormixed retention policy.

triggerCheckAttribute The job has the ability to use an attribute to determine if thetrigger should be performed. For example, if the data has afield that indicated if the employee has left the company (i.e.,hasLeft = true). The attribute is ’hasLeft’ and the value is ’true’.In this case, the TriggerCheckValue would be set to ’true’. Ifthe value in the attribute is anything but the trigger checkvalue, then the event would not be triggered.

triggerCheckValue This is value to check against for whether the event shouldbe triggered.

triggerDateAttribute The job requires a date to trigger the event. This propertyis an attribute name where the job will get the trigger date(i.e., if the event is to keep all employee records 5 years afterthe employee leaves the company). There would be anattribute called ’terminationDate’ that contains the date theemployee left the company. Putting terminationDate in theTriggerDateAttribute property would instruct the job to fetchthe trigger date from the terminiationDate attribute.

Using the Archive Audits Job

When an audit object is created in the system, you cannot execute a search against it. The objects arecollected in a temporary storage until they are archived. The Archive Audit job must be executed toallow audit objects to be searched.

Audits are organized in SIP packages by day (i.e., only audit for the same day will be put in a SIP).This allows you to apply a retention policy to the archived audits and allow for proper dispositionof audits.

The following parameters can be configured:

105

Page 106: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Parameter Description

MaxSipAuditEntries This is the maximum number of audit entries that will be in aSIP package. The default is 50,000 entries.

Start Date Specify a date to start selecting audits from. Leaving thisdate empty will default to today’s date. The format isYYYY-MM-DD.

End Date Specify a date to end selecting audits. Leaving this field emptywill default to today’s date. The format is YYYY-MM-DD.

The start and end date are used to allow for archiving audits that are not current.

The normal operation is to leave the date empty and allow the job to archive all audits for the day.The job will also go back 30 days searching for audits to archive. It will stop once it finds that thereare no audits to archive for a day.

This job should not be application scoped.

Using the Clean Up Purge Candidate List Job

The Generate Purge Candidate List job generates purge lists for records that are eligible fordisposition. Once disposition has been run to dispose of the records, the purge list status will beset to disposed. The Generation Purge Lists job will set to cancelled any purge lists that have notbeen processed by the disposition job and will generate new lists. The Clean Up Purge CandidateList job will remove a cancelled or disposed purge list. It will depend on the customer how oftenthey would like to run this job, as it depends on how often disposition is run. This job should notbe application scoped.

Using the Requalification Job

This job is necessary to run if a retention policy is changed and the changes need to be retroactive.

Cannot be scoped to an application and does not have any parameters.

Using the Refresh Metrics Job

Calculating the metrics information in the InfoArchive Dashboard can take a significant amount oftime. Therefore, the Dashboard retrieves most of its information from pre-populated values and theRefresh Metrics job populates these values. The job will scan the system and populate the metricsinformation. The customer can decide how often the metrics information should be updated, as itwould depends on their individual use cases.

This job should not be application scoped and does not have any parameters.

106

Page 107: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Using the Remove Policy Job

This job does a mass removal of the policy and can be scoped to specific applications.

The following parameters can be configured:

Parameter Description

retentionPolicyName Name of retention policy that will be removed from items.Retention policy must be defined and name is case sensitive.

type Optional field to specify the type in which the job will removethe policy. Possible values include:

• application

• aip

• aiu

• table

• row

Row refers to a table row.

Using the Trigger Event Policy Job

The Trigger Event Policy job triggers events based on a trigger file that the customer provides.Usually this is a result of another system that would generation the event (i.e., HR system wouldindicate when the employee left the company). The values of the trigger file contain the context thatgroups the records together. For example, if you are keeping employee records until 5 years after theemployee leaves the company, then you would want to group the records around a common field(i.e., context). The context in this case would be the employee number. When the event policy isapplied, a context would have been specified. When the event needs to be triggered, a context anda trigger date need to be specified.

This job has to be application scoped.

There following parameter can be configured:

Parameter Description

triggerFile This is a path to a trigger file that contains a list of triggers(context, trigger date and condition). The following illustratesthe format of the file:

<?xml version="1.0"?> <triggers> <event>

<context>00457</context> <triggerdate>2010-01

-31</triggerdate> <condition>condition<

/condition> </event> <event> <context>00345<

/context> <triggerdate>2014-02-28</triggerdate>

107

Page 108: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Parameter Description<condition>condition</condition> </event>

</triggers>

The location is relative to where the server is deployed. Ifmultiple InfoArchive servers are installed, it is recommendedthat you use a network location (versus a local path).

Populating Event Dates for the Trigger Event Policy Job

An XML file is used to populate event dates for the Trigger Event Policy job:

Name Description

context Enter the context for the event (i.e., employee number).

triggerdate Enter the date when the event happened or is planned tohappen in a YYYY-MM-DD format. A date must be within thefollowing range: 1000-01-01 – 2999-12-31.

condition Enter the name of the condition, which must match thecondition specified on the retention policy. The value is casesensitive.

<?xml version="1.0"?><triggers><event><context>89</context><triggerdate>2016-02-28</triggerdate><condition>tradeversion</condition></event><event><context>77</context><triggerdate>2016-02-28</triggerdate><condition>tradeversion</condition></event></triggers>

Close

Parameter Name Description

phaseToProcess Type: ENUM

Value: POOLED_ONLY, AGGREGATE_ONLY, ALL_PHASES

Default Value: ALL_PHASES

Mandatory: No

closeDelay Type: int

Value: positive integer (>= 0)

Default Value: 5 minutes (300 seconds)

108

Page 109: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Parameter Name Description

Mandatory: No

To delay the effectiveCloseDate (value set in seconds).

Clean

Name Description

allPhases Type: Boolean

Default Value: True

Value to use for all phases

aipPhase Type: Boolean

Default Value: True

If true or allPhases = true, delete all PRUNE AIPs + rejectedand invalidated AIP without commit date.

contentPhase Type: Boolean

Default Value: True

If true or allPhases = true, delete all orphaned contents.

searchResultPhase Type: Boolean

Default Value: True

orderItemPhase Type: Boolean

Default Value: True

If true or allPhases = true, delete all expired order items.

Table Data Volume Update

Updates the system so that the Refresh Metrics job can calculate the correct pricing information fortable character counts. This job takes no parameters.

109

Page 110: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Using the Jobs TabThe Jobs tab allows the Administrator and Developer to:

• Create a job or edit an existing job.

• View information for all of the jobs.

• Run a job.

Jobs are displayed in a table that contains the following information:

Column Description

Job Name Indicates the name of the job.

Click the name of a job to view its run history. For moreinformation, see Viewing a Job’s Run History.

A menu that allows the Administrator to:

• Edit a job. For more information, see Editing a Job.

• Run a job or starting a schedule. For more information, seeRunning a Job.

• Suspend a job. For more information, see Suspending a Job.

Description A description of the job.

Applied To Indicates the scope of the job and whether it is applied to:

• System

• All Applications

• Specific Systems

Last Run Indicates the date the job was last executed.

Last Run Status Indicates the status of the last execution of the job. Possiblevalues include:

• Scheduled: The job is set to schedule.

• Running: The job is currently running.

• Success: The job was executed successfully.

• Failure: The job failed to execute.

• Skipped: The job was skipped.

110

Page 111: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Next Run Indicates the next scheduled run of the job.

Job Condition Indicates the current job status. Possible values include:

• Active: The Administrator is able to run the job.

• Suspended: The Administrator suspended the job’s currentrun.

An information tab contains the custom properties of a selected job.

Viewing a Job’s Run History

The Administrator is able to view the run history by clicking the name of a job. The Job Run Historywindow contains the following information:

Column Description

Scheduled Date Indicates the date of the job schedule.

Scheduled By Indicates the name of the person who initiated the job run.

Start Time Indicates the time the start time of the job run.

End Time Indicates the end time of the job run

Status Indicates the status of the last execution of the job. Possible valuesinclude:

• Scheduled: The job is set to schedule.

• Running: The job is currently running.

• Success: The job was executed successfully.

• Failure: The job failed to execute.

• Skipped: The job was skipped.

Application Name If the job was applied to an application, the name of the applicationis indicated.

Creating a Job

One of the reasons for creating a job is that a customer can run a specific job manually.

1. On the Jobs tab, click +.

2. Select the job type you want to act as the template for the job being created. The new job willinherit all configuration and property values from this existing job.

111

Page 112: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

3. Click Next.

4. Enter the following information:

Field Description

Job Name Enter a unique name for the new job.

Handler Indicates the Java handler that executes after the completesits run.

Description Enter a description of the job being created.

Apply To Specify the scope of the job and whether it is applied to:

• System: If selected, the job will not be dependent onany application and the JobHandler will not receive theapplication during execution.

If prompted, redefine values of the job properties.

• Specific Systems: If selected, select the applications thejob can be applied to. Click Select all to select all ofthe applications.

Note: ’Select All’ selects of the items in the current list.If a user creates a new application, and wants the jobto execute for this new application, open this page andselect the new application.

5. Click Next.

6. Enter the following information:

Field Description

Repeatable Indicate whether the job is a repeatable job. For moreinformation, see Repeatable Jobs.

Schedule By Indicate the if job is to be executed:

• Manually

• Interval: If selected, enter

— Interval: Specify the number of minutes that mustpass before the job can be repeated.

— Indicate the maximum number of attempts theJobInstance will be rescheduled after failing to executesuccessfully.

112

Page 113: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Field Description

— Expiration Interval: Indicate the number of minutesthe job will run before logging a failure.

— Retry Interval: After an instance execution fails,indicate the number of minutes the server should waitto reschedule the job.

• Expression: If selected:

— Enter the expression that defines the job scheduleby specifying when the job will run by the second,minute, hour, day of the week, day of the month,month or any combination of these options. Theexpression supports "Cron Expression" syntax.

Once the expression has been entered, the systemindicates the job’s next run date and time.

— Indicate the maximum number of attempts theJobInstance will be rescheduled after failing to executesuccessfully.

— Expiration Interval: Indicate the number of minutesthe job will run before logging a failure.

— Retry Interval: After an instance execution fails,indicate the number of minutes the server should waitto reschedule the job.

7. Click Next.

8. Review the information you have entered. When satisfied that the information is correct, clickFinish.

The job now appears in the table on the Jobs tab.

Editing a Job1. On the Jobs tab, click for the job being edited and select Edit Job.

2. Edit the following information:

Field Description

Description Enter a description of the job being created.

113

Page 114: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Field Description

Schedule By Select one of the following:

• Manually

• Interval: If selected, enter

— Interval: Specify the number of minutes that mustpass before the job can be repeated.

— Indicate the maximum number of attempts theJobInstance will be rescheduled after failing to executesuccessfully.

— Expiration Interval: Indicate the number of minutesthe job will run before logging a failure.

— Retry Interval: After an instance execution fails,indicate the number of minutes the server should waitto reschedule the job.

• Expression: Enter the expression that defines the jobschedule by specifying when the job will run by thesecond, minute, hour, day of the week, day of themonth, month or any combination of these options. Theexpression supports "Cron Expression" syntax.

Once the expression has been entered, the systemindicates the job’s next run date and time.

Max Attempts Indicate the maximum number of attempts the JobInstancewill be rescheduled after failing to execute successfully.

Expiration Interval Indicate the number of minutes the job will run beforelogging a failure.

Retry Interval After an instance execution fails, indicate the number ofminutes the server should wait to reschedule the job.

Apply To Specify the scope of the job and whether it is applied to:

• System: If selected, redefine values of the job properties.

• Specific Systems: If selected, select the applications thejob can be applied to. Click Select all to select all ofthe applications.

3. Click Save.

114

Page 115: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Running a Job

1. On the Jobs tab, click for the job being executed and select one of the following:

• If the Schedule By setting is ’Manual’, click Run.

• If the Schedule By setting is ’Interval’ or Expression’, click Start Schedule.

You are notified if the job was successfully executed, scheduled or if the job failed to run. If the jobfailed, an error message indicates the reason why the job was not successfully executed.

Suspending a Job

1. On the Jobs tab, click for the job that is running and select Suspend.

The Job Condition will be ’Suspended’,

Click Resume to continue the job’s run.

Using the Storage TabThe Storage tab allows the Administrator to:

• Register a federation to act as a container for databases.

• Create a database to hold archived records.

• Create a file system root to hold unstructured content belonging to records.

Federations

Existing federations are displayed in a table that contains the following information:

Column Description

Federation Name Indicates the name of the federation.

Bootstrap Indicates the Xhive connection string.

The number of federations listed in the table depends on what mode you are running InfoArchive in.If you are running InfoArchive in the default mode, only the mainFederation is listed. If you haveupgraded to InfoArchive 4.1, however, two federations is listed:

• mainFederation

• retentionFederation

You can put everything into one federation or you can choose to store retention information in aseparate database. If using two federation, you will require two service, one for each federation.

115

Page 116: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Registering a Federation

1. On the Storage tab, click +.

2. Enter the following information:

Field Description

Federation Name Enter a unique name for the federation.

Superuser Password Enter a password the user will have to enter to configure thefederation.

Connection URL Enter the URL of the federation, which specifies the federationbootstrap.

3. Click Register.

The federation now appears in the table on the Storage tab.

Databases

Existing databases are displayed in a table that contains the following information:

Column Description

Database Name Indicates the name of the database.

Federation Name Indicates the name of parent’s federation.

Bootstrap Indicates the Xhive connection string.

Creating a Database

1. On the Storage tab, click +.

2. Enter the following information:

Field Description

Database Name Enter a unique name for the database.

Admin Password Enter a password the Administrator will have to enter to configurethe database.

xDB Federation Enter the name of the parent federation.

3. Click Create.

The database now appears in the table on the Storage tab.

116

Page 117: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Adding a Storage System

To add a storage system:

1. On the Storage tab, click +.

2. Select a Storage Type:

Item Selected Actions

File Storage

Isilon a. Enter the Configuration Label

b. Enter the File System Root Path.

Local File System a. Enter the Configuration Label

b. Enter the Folder Path.

Object Storage

ECS a. Enter the Configuration Label

b. Enter a Description for the ECS

c. Enter the URL of the object storage being added.

Enter the following information for the ECS Credentialsobject:

d. Enter the Credential Name.

e. Enter an Credential Description.

f. Enter an Access Key ID.

g. The ECS Management REST API provides the ability toallow authenticated domain users to request a secret key toenable them to access the object store. Enter the Secret key.

Click ’+’ to add another credentials object. Click ’x’ to delete anaccess pair.

Legacy Object Storage

Centera a. Enter the Configuration Label

b. Enter a Description.

c. Enter the Connection String.

117

Page 118: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Item Selected Actions

Enter the following Pool Entry Authorization (PEA)information:

d. Enter the Variable.

e. Enter the Content.

Click ’+’ to add additional variables. Click ’x’ to delete an accesspair.

3. Click Create.

Using the Spaces TabThe Spaces tab allows the Administrator to create a space, which holds records and content for anapplication. Before you create a space, complete the following:

• Register a federation

• Create a database

• Add a storage system

Spaces are displayed in a table that contains the following information:

Column Description

Space Name Indicates the name of the space.

Application Indicates the name of the application the space is associatedwith.

Database Library Indicates the database library associated with the space.

Content Folder Indicates the folder name.

Creating a Space

The space is what ties the storage system to a particular application.

1. On the Spaces tab, click +.

2. Enter the following information:

Field Description

Application Select the application that will be associated with the space.

118

Page 119: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Field Description

Space Name Enter a name for the space.

Database Select the database that will be associated with the space.

Database Library Enter a name for the database library.

File System Root Select the file system root for the space.

Content Folder Name Enter the name of the content folder.

3. Click Create.The space now appears in the table on the Spaces tab.

Using the Stores TabThe Stores tab allows the Administrator to add and configure a storage area for different binarycontent.

1. Click ’+’ to add a store.

2. Enter the following information:

Field Description

Configuration Label Enter a name for the store being created.

Application Select the application.

Space Select the space.

Space Root Select the space root.

File System Folder Select the folder.

Status Indicate the store status.

3. Click Create.

Editing Stores Configuration

The Holdings tab contains a list of the available holdings.

An information tab contains the custom properties of a selected holding. The following tabs appear inthe detail panel:

Tab Description

Summary Contains general information about the holding.

119

Page 120: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Tab Description

Store Contains information about stores that are connectedto the holding.

Retention Contains information about retention classes.

xDB Contains information about xDB databases and libraries.

xDB Pooled Libraries Contains information about the pooled libraries on theholding.

Permissions Contains a list of restrictions for the holding.

You are also able to edit the stores configuration for a holding.

1. Select the application and access the Holdings tab.

2. Click the button and select Edit stores configuration for the holding being updated.

3. Make the required changes and click Edit.

Configuring ECS StorageThis section illustrates how to configure system objects required to ingest data into Elastic CloudStorage (ECS).

To configure an ECS object for ingestion, you must:

1. Add a storage system with an ECS storage type.

a. Specify the URL, Access Key and Secret Key to connect to the ECS instance.

2. Create an application.

3. Create a space under the newly created application.

a. Specify Object Storage in the Storage System field.

b. Select the URL created in step 1a.

4. Add a store.

a. Create a bucket to store the data in.After following these steps, you will be able to ingest data into ECS.

Installing Centera SDK on LinuxThis section illustrates how to install Centera SDK on a Linux operating system.

1. Download and install the Centera SDK from the EMC Developer download center.

a. Download the Centera_SDK_Linux-gcc4.tgz file from the Linux category.

120

Page 121: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

b. Unpack the archive and run the Centera_SDK/install/install file to proceed with theinstallation. Specify the folder you want SDK to be installed in. The default location is/usr/local/Centera_SDK. In this example, the default location is used.

2. Create or generate a file with .pea extension for your Centera storage. This file should containvalid credential settings to connect to the storage. For more information, see the Atmos CAS andCentera Online Clusters (Connectivity Information) page.

3. Add the following environment variables, which enables Java to locate the libraries:

export PATH=$PATH:/usr/local/Centera_SDK/lib/64export LD_LIBRARY_PATH=/usr/local/Centera_SDK/lib/64

4. Ensure that the Centera server is available by using the GetClusterInfo class. Navigate to thedestination folder of the extraction that you previously created and run the following commands:

cd ./Centera_SDK/sdk_samples/GetClusterInfojavac -cp /usr/local/Centera_SDK/lib/FPLibrary.jar GetClusterInfo.javajava -cp /usr/local/Centera_SDK/lib/FPLibrary.jar:. GetClusterInfo

You should see the command line request for the connection string. Provide the string in thefollowing format.

CENTERA_HOST?PEA_FILE_PATH

For example:

centera.testhost?/usr/local/Centera_SDK/example.pea

The installation is successful if you are able to see the output that contains information about yourCentera storage.

Installing Centera SDK on WindowsThis section illustrates how to install Centera SDK on a Windows operating system.

1. Download and install the Centera SDK from the EMC Developer download center.Download either the 32– or 64–bit version of the SDK from the Windows category.

Note: The SDK supports only Windows Server 2012 R2, Windows 8 SE (32– and 64–bit) andWindows 8 Enterprise. For more information read ./docs/EMC_Centera_SDK_Windows_3.4_Release_Notes.pdf inside the archive.

2. Unpack the archive to the appropriate path. The following example uses "c:\Tools".

3. Create or generate a file with .pea extension for your Centera storage. This file should containvalid credential settings to connect to the storage. For more information, see the Atmos CAS andCentera Online Clusters (Connectivity Information) page.

4. Add the following environment variables, which enables Java to locate the libraries:

C:\Tools\Centera_SDK\lib64

121

Page 122: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

5. Ensure that the Centera server is available by using the GetClusterInfo class. Navigate to thedestination folder of the extraction that you previously created and run the following commands:

cd .\Centera_SDK\sdk_samples\GetClusterInfojavac -cp C:\Tools\Centera_SDK\lib\FPLibrary.jar GetClusterInfo.javajava -cp C:\Tools\Centera_SDK\lib\FPLibrary.jar;. GetClusterInfo

You should see the command line request for the connection string. Provide the string in thefollowing format.

CENTERA_HOST?PEA_FILE_PATH

For example:

centera.testhost?C:\Tools\Centera_SDK\example.pea

The installation is successful if you are able to see the output that contains information about yourCentera storage.

Configuring Centera StorageThis section illustrates how to configure system objects required to ingest data into Centera storage.

To configure a Centera object for ingestion, you must:

1. Add a storage system with a Centera storage type.

a. Specify the Connection String, which is the Centera IP.

b. Enter the following Pool Entry Authorization (PEA) information:• Enter the Variable.

• Enter the Content.

2. Create a space under the newly created application.

a. Select Legacy Object Storage in the Additional System Storage field.

b. Select the connection string you created in step 1a.

3. Add a store.

a. Select Centera as the value for the Space Root field.

b. Create a bucket to store the data in.

4. Create an application or edit an existing application to use the newly created Centera store. For aSIP application, it is necessary to assign the stores (Centera, ECS, Isilon, File) at the holding level.

Be sure to process the post-installation steps to include Centera SDK libraries.

Using the Audit TabBeing able to prove that certain actions have been performed is important, particularly when it comesto compliance. The Audit tab allows the Administrator or Developer to configure which events are

122

Page 123: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

audited. InfoArchive’s auditing process allows the system to control the amount of content beinggenerated.

An Administrator or Developer require permission to manage audits.

The Application filter allows several levels of access to audit events for the selected application:

• System level: Contains a list of audit-events that correspond to services.

• Tenant level: Contains a list of tenant-level audit-events.

• Application level: Contains a list of audit-events that correspond to particular application.

The Event Category filter allows an Administrator or Developer to determine which specific eventsare currently being audited. The filters depend on the level of access chosen. A check mark indicatesthat a particular event is currently being audited. Click a box to add or remove a check mark. Thefollowing table outlines the applications and the event categories that can be audited:

Application Description

System Provisioning EventsOther

Tenant Provisioning EventsCompliance EventsIngestionOther

Customer-Created Applications Provisioning EventsCompliance EventsIngestionOther

To audit an event:

1. Click the applicable event box.

2. Click Save.

Audit Application

All audit events in the system are archived and stored as SIPs in a holding. For this purpose, install the"Audit holding" delivered within the "Tools" project and can be found at "Tools/applications/Audit".To install the holding, call the "ant" task in the "Audit" root folder:<path_to_tools>/Tools/applications/Audit>

123

Page 124: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

The following three search templates are provided:

• System Search: For events, such as login/logout.

• Tenant Search: For operations that are not associated with an application. For example, creating aretention policy or a hold that can be used by multiple applications.

• Application Search: For events associated with an application, which includes such events as:

— Ingesting a package (AIP),

— Receiving a package (AIP),

— Applying retention,

— Running the search.

Every search contains a search form that has a pre-defined Event-Type and Event-Names for thecorresponding audit level.

Load Balancing Testing of InfoArchive ServersThis section illustrates how to perform load balancing testing of InfoArchive servers.

The test setup includes one load balancer, two InfoArchive servers and one xDB server. TheInfoArchive servers have a common shared file-system to share a filestore and keystore.jceksfile.

In the following example, three virtual machines are used to perform the test. The first machine hasinstalled an xDB server and loadbalancer while the second and third machines have InfoArchivesever instances.

Load Balancing with Apache1. Set up Apache2 with the following mods:

:~# apt-get install --yes apache2:~# a2enmod proxy:~# a2enmod proxy_balancer:~# a2enmod proxy_http:~# a2enmod lbmethod_byrequests

2. Edit /etc/apache2/mods-enabled/proxy_balancer.conf:

<Proxy balancer://mycluster>BalancerMember http://10.64.155.217:8765BalancerMember http://10.64.155.218:8765ProxySet lbmethod=byrequests

</Proxy>ProxyPass / balancer://mycluster/

3. Edit /etc/apache2/ports.conf:

124

Page 125: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Listen 8765

:~# service apache2 restart

4. Set up NFS server :

:~# apt-get install nfs-kernel-server nfs-common

5. Edit /etc/exports to export shared folders:

/home/ia/config/server *(rw,sync,fsid=0,crossmnt,no_subtree_check,no_root_squash)/home/ia/filestore *(rw,sync,crossmnt,no_subtree_check,no_root_squash):~# /etc/init.d/nfs-kernel-server restart

6. Mount remote folders containing keystore.jceks and filestore on the InfoArchive servermachines:

mount 10.64.155.216/home/ia/config/server /home/ia/config/servermount 10.64.155.216/home/ia/filestore /root/.ia

7. Edit application.yml for both InfoArchive servers:

system.xdb.dataNode.bootstrap: xhive://10.64.155.216:2910auditData.xdb.dataNode.bootstrap: xhive://10.64.155.216:2910crypto.keyStore.keyStoreFileLocation: "/home/ia/config/server/keystore.jceks"

The file store is automatically placed at /root/.ia, where root is the user in the setup.

Parallel SIP ingestion

In the following example, two machines (laptop and desktop) are used for ingestion. The laptop hasone SIP loader and the desktop has two SIP loaders.

The PhoneCalls application is used as a base for the testing. The ANT command must first beexecuted to setup all application configuration data.

PhoneCalls1, PhoneCalls2 and PhoneCalls3 folders are created with the same content as thePhoneCalls application.

The following are the build.properties changes:

services=http://10.64.155.216:8765/services #points to the load balancerfederationBootstrap=xhive://10.64.155.216:2910loop.cmd@echo offECHO started at %time%FOR /L %%A IN (1,1,%1) DO (ant receive-ingestECHO step %%A of %1 finished)

To start a SIP ingestion 100 times, execute ’loop 100’ inside the application folder. It will ingest 10SIPs from the PhoneCals application 100 times.

125

Page 126: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Do the same in PhoneCalls1, PhoneCalls 2 and PhoneCalls3 folders to start parallel ingestion.Executing ’loop 100’ in all three folders will result 3000 SIPs ingested.

Parallel Table Ingestion

The Tickets application is used as a base for the testing. The ANT command must first be executed tosetup all application configuration data.

Tickets1 and Tickets2 folders created with the same content as the Tickets application.

The following are the build.properties changes:

host=10.64.155.216 #points to the load balancerfederationBootstrap=xhive://10.64.155.216:2910loop.cmd@echo offECHO started at %time%FOR /L %%A IN (1,1,%1) DO (ant ingest-tablesECHO step %%A of %1 finished)

To start a table ingestion 50 times, execute ’loop 50’ inside the application folder.

The ingest-tables target is supposed to be re-entrant, but occasionally the following eTag-basedexception is encountered:

configure-result-store:[echo] Configuring file-system-folder object with name 'Tickets-result-folder'[echo] Configuring result store object with name 'Tickets-result-store'

[ia-configure] 13:59:13.299 ERROR - Command failed org.springframework.web.client.HttpClientErrorException: 412 errors:[ia-configure] Version mismatch. Version mismatch for Object with id dc7c2624-97f2-444c-83c0-0cfaf83f616a. (based on ETag).[ia-configure]BUILD FAILEDE:\ia\infoarchive\tools\build-table.xml:381: The following error occurred whileexecuting this line:E:\ia\infoarchive\tools\build-common.xml:67: org.springframework.web.client.HttpClientErrorException: 412 errors:Version mismatch. Version mismatch for Object with id dc7c2624-97f2-444c-83c0-0cfaf83f616a. (based on ETag).

at com.emc.ia.cli.common.HttpResponseErrorHandler.handleError(HttpResponseErrorHandler.java:63)at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:641)at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:597)at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:572)

126

Page 127: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Load Balancing Testing of the InfoArchive WebApplicationThis section contains the information about load balancing testing of the InfoArchive web applicationwith:

• HTTP

— Sticky Sessions (use ./httpd.HTTP.STICKY.conf)

— Non-sticky sessions (use ./httpd.HTTP.conf)

• HTTPS (SSL or TLS)

— Sticky Sessions (use ./httpd.HTTPS.STICKY.conf)

— Non-sticky sessions (use ./httpd.HTTPS.STICKY.conf)

HTTP with Sticky Sessions

The default value of ProxyPreserveHost is ’Off’. If this value is not updated to ’On’ (see thehighlighted example below), it will lead to login fail.

Key Section of httpd.HTTP.STICKY.conf

<IfModule mod_proxy_balancer.c>ProxyPreserveHost OnPoxyPass / balancer://gateway/ stickysession=ROUTEIDProxyPassReverse / balancer://gateway/Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED<Proxy balancer://gateway>BalancerMember http://localhost:8080/ route=gateway8080 loadfactor=1BalancerMember http://localhost:8080/ route=gateway8082 loadfactor=1ProxySet stickysession=ROUTEID</Proxy><Location /balancer-manager>SetHandler balancer-manager</Location></IfModule>

How to Run Gateway/InfoArchive Web Application

To run multiple instance of Gateway/InfoArchive web application, pass the following additionalarguments when running Gateway/InfoArchive web application:

Instance at port 8080-Dcom.sun.management.jmxremote.port=37880Instance at port 8080-Dinfoarchive.gateway.port=8080 -Dcom.sun.management.jmxremote.port=37882

127

Page 128: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

InfoArchive Server and Gateway/InfoArchiveWeb Application Communication SetupThis section describes how to set up the InfoArchive server and Gateway/InfoArchive webapplicationfor HTTP communication:

Browser <-------- HTTPS --------> Gateway IAWA <-------- HTTPS --------> IAS

First, generate or obtain certificates from some CA Root. Once you have the certificates, complete thefollowing procedure:

1. Import the InfoArchive server certificate into the key store.

2. Adjust the INSTALL_INFOARCHIVE_DIR\config\server\application-ssl.yml file to pointto the key store and specify the key store* passwords.

3. Run the InfoArchive server:

> bin\infoarchive-server-ssl.bat

4. Import the InfoArchive server certificate into the Gateway/InfoArchive server key store.

5. Export the InfoArchive server and Gateway/InfoArchive web application public certificate intothe Gateway/InfoArchive web application truststore.

6. Adjust the INSTALL_INFOARCHIVE_DIR\config\webapp\application.yml file:

spring:application:name: infoarchive.gateway

profiles:active: infoarchive.profile.HTTPS,infoarchive.gateway.profile.AUTHENTICATION_IN_MEMORY

cloud:config:enabled: false

infoarchive:gateway:host: localhostport: 8080contextPath: /token:secret: secret

server:host: ${infoarchive.gateway.host}port: ${infoarchive.gateway.port}contextPath: ${infoarchive.gateway.contextPath}

zuul:routes:restapi:path: /restapi/**sensitiveHeaders: ""url: http://localhost:8080/

addProxyHeaders: truehost:

128

Page 129: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

socket-timeout-millis: 60000---spring:profiles: "infoarchive.profile.HTTPS"

server:ssl:key-store: "location of gatewayKeyStore.jks"key-store-password: abcdefg (this is a sample)key-password: abcdefg (this is a sample)keyStoreType: JKS

zuul:routes:restapi:path: /restapi/**sensitiveHeaders: ""url: "https://localhost:8080/"

7. Add the following two command lines parameters to the INSTALL_INFOARCHIVE_DIR\bin\infoarchive-webapp.bat:

-Djavax.net.ssl.trustStore=<gateway truststore location goes here>-Djavax.net.ssl.trustStorePassword=<truststore password goes here>

For example:

::@rem Execute infoarchive-webappcd "%APP_HOME%""%JAVA_EXE%" -Djavax.net.ssl.trustStore=<gateway truststore location goeshere> -Djavax.net.ssl.trustStorePassword=<truststore password goes here>%DEFAULT_JVM_OPTS% %JAVA_OPTS% %INFOARCHIVE_WEBAPP_OPTS% -jar "%CLASSPATH%"%CMD_LINE_ARGS%:

You can add the additional system property -Djavax.net.debug=ssl:handshake to debugSSL.

8. Run Gateway/InfoArchive web application:

> bin\infoarchive-webapp.bat

Self-Signed Certificates-Based Setup

In the following scenario, the distribution has been extracted into:

c:\ia40\infoarchive

The InfoArchive server and the Gateway/InfoArchive web application are running on the samemachine, which means that the localhost works everywhere. You will have to use the IP addressesif the InfoArchive server and the Gateway/InfoArchive web application are running on differentmachines.

129

Page 130: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

The InfoArchive server and Gatway/InfoArchive web application must both run in HTTPS mode.You cannot run one in HTTP and the other in HTTPS mode.

1. Create the following directories:

> cd c:\ia40\infoarchive> mkdir config\server\https> mkdir config\webapp\https

2. Create a self-signed InfoArchive server certificate:

> cd c:\ia40\infoarchive\config\server\https

> keytool -genkey -noprompt -trustcacerts -keyalg RSA -aliasIA_SERVER_CERT -dname "CN=localhost, OU=JavaSoft, O=Sun, L=Cupertino,S=California, C=US" -keypass abcdefg -storetype PKCS12 -keystore serverCertStore.p12-storepass abcdefg

The generated file is named serverCertStore.p12.

3. List the self-singed InfoArchive server certificate:

> cd c:\ia40\infoarchive\config\server\https

> keytool -list -v -keystore serverCertStore.p12 -storepass abcdefg-storetype PKCS12

4. Create the keystore for infoArchive server:

> cd c:\ia40\infoarchive\config\server\https

> keytool -importkeystore -deststorepass abcdefg -destkeypass abcdefg-destkeystore serverKeyStore.jks -srckeystore serverCertStore.p12-srcstoretype PKCS12 -srcstorepass abcdefg -alias IA_SERVER_CERT

5. Export the InfoArchive server certificate:

> cd c:\ia40\infoarchive\config\server\https

> keytool -export -noprompt -rfc -alias IA_SERVER_CERT -file server_public_cert.cert-keystore serverKeyStore.jks -storepass abcdefg -storetype JKS

The generated file is named server_public_cert.cert.

6. Import the InfoArchive server certificate into Gateway/infoArchive web application:

> copy c:\ia40\infoarchive\config\server\https\server_public_cert.certc:\ia40\infoarchive\config\webapp\https

> cd c:\ia40\infoarchive\config\webapp\https

130

Page 131: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

> keytool -import -noprompt -trustcacerts -alias IA_SERVER_CERT -fileserver_public_cert.cert -keystore gatewayTrustStore.jks -storepass abcdefg

The generated file is named gatewayTrustStore.jks.

7. List the imported InfoArchive server certificate:

> cd c:\ia40\infoarchive\config\webapp\https> keytool -list -v -keystore gatewayTrustStore.jksKeystore type: JKSKeystore provider: SUN

Your keystore contains 1 entries

Alias name: server_public_certCreation date: Jun 10, 2016Entry type: trustedCertEntry

Owner: CN=localhost, OU=JavaSoft, O=Sun, L=Cupertino, ST=California,C=USIssuer: CN=localhost, OU=JavaSoft, O=Sun, L=Cupertino, ST=California, C=USSerial number: 357866c0Valid from: Fri Jun 10 08:32:40 PDT 2016 until: Thu Sep 08 08:32:40 PDT 2016Certificate fingerprints:

MD5: F1:05:E4:6A:F4:7C:C4:A7:8A:0E:B6:DE:A9:16:DF:AASHA1: CC:65:A3:74:85:C0:97:DA:95:AE:13:1F:CA:75:16:CA:66:F2:17:87SHA256: 22:86:56:6B:16:A1:D5:C1:66:CE:2B:A7:F5:CA:EC:74:45:C5:A5:3E:00:DD:F1:28:8F:F3:F8:7D:2B:00:25:A4Signature algorithm name: SHA256withRSAVersion: 3

Extensions:

#1: ObjectId: 2.5.29.14 Criticality=falseSubjectKeyIdentifier [KeyIdentifier [0000: 5D 34 8D B8 34 CF 0B A0 64 05 20 93 E4 BA 92 28 ]4..4...d. ....(0010: CD EB 20 64 .. d]]

**************************************************************************************

8. Create the self-signed Gateway/InfoArchive web application certificate:

> cd c:\ia40\infoarchive\config\webapp\https> keytool -genkey -noprompt -trustcacerts -keyalg RSA -alias IA_GATEWAY_CERT -dname"CN=localhost, OU=JavaSoft, O=Sun, L=Cupertino, S=California, C=US" -keypassabcdefg -storetype PKCS12 -keystore gatewayCertStore.p12 -storepass abcdefg

The generated file is gatewayCertStore.p12.

9. List the self-singed Gateway/InfoArchive web application certificate:

> cd c:\ia40\infoarchive\config\webapp\https> keytool -list -v -keystore gatewayCertStore.p12 -storepass abcdefg -storetype

131

Page 132: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

PKCS12

10. Create the keystore for Gateway/InfoArchive web application:

> cd c:\ia40\infoarchive\config\webapp\https> keytool -importkeystore -deststorepass abcdefg -destkeypass abcdefg-destkeystore gatewayKeyStore.jks -srckeystore gatewayCertStore.p12-srcstoretype PKCS12 -srcstorepass abcdefg -alias IA_GATEWAY_CERT

11. Export the Gateway/InfoArchive web application certificate:

> cd c:\ia40\infoarchive\config\webapp\https> keytool -export -noprompt -rfc -alias IA_GATEWAY_CERT -file gateway_public_cert.cert-keystore gatewayKeyStore.jks -storepass abcdefg -storetype JKS

The generated file is named gateway_public_cert.cert.

12. Import the Gateway/InfoArchive web application certificate into Gateway/InfoArchive webapplication. This step is required because Gateway is its own client for getting the group to rolemapping from the InfoArchive server via the REST API from the backend:

> cd c:\ia40\infoarchive\config\webapp\https> keytool -import -noprompt -trustcacerts -alias gateway_public_cert-file gateway_public_cert.cert -keystore gatewayTrustStore.jks -storepass abcdefg

The generated file is named gatewayTrustStore.jks.

13. List the imported Gateway/InfoArchive web application certificate:

> cd c:\ia40\infoarchive\config\webapp\https> keytool -list -v -keystore gatewayTrustStore.jksKeystore type: JKSKeystore provider: SUN

Your keystore contains 2 entries

Alias name: gateway_public_certCreation date: Jun 10, 2016Entry type: trustedCertEntry

Owner: CN=localhost, OU=JavaSoft, O=Sun, L=Cupertino, ST=California, C=USIssuer: CN=localhost, OU=JavaSoft, O=Sun, L=Cupertino, ST=California, C=USSerial number: 78b0bb1bValid from: Fri Jun 10 08:41:09 PDT 2016 until: Thu Sep 08 08:41:09 PDT 2016Certificate fingerprints:

MD5: 21:21:12:82:77:83:06:B9:EC:87:93:D3:07:FD:50:22SHA1: 56:1C:F2:D2:17:75:9A:4F:FB:F3:D9:C3:89:64:7D:29:52:4E:DC:1BSHA256: 92:3D:F3:E3:83:98:61:6D:34:02:66:6E:2D:07:60:F6:E9:DD:3D:BA:AD:AC:31:1C:91:39:76:85:9A:9F:C0:FDSignature algorithm name: SHA256withRSAVersion: 3

Extensions:

#1: ObjectId: 2.5.29.14 Criticality=falseSubjectKeyIdentifier [KeyIdentifier [0000: 3C 90 5E 2D CF 7E 2A 49 AB 20 DC E1 E5 2A 1E 71 <.^-..*I. ...*.q0010: 4A 2A 83 A1 J*..]

132

Page 133: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

]

**************************************************************************************

Alias name: server_public_certCreation date: Jun 10, 2016Entry type: trustedCertEntry

Owner: CN=localhost, OU=JavaSoft, O=Sun, L=Cupertino, ST=California, C=USIssuer: CN=localhost, OU=JavaSoft, O=Sun, L=Cupertino, ST=California, C=USSerial number: 357866c0Valid from: Fri Jun 10 08:32:40 PDT 2016 until: Thu Sep 08 08:32:40 PDT 2016Certificate fingerprints:

MD5: F1:05:E4:6A:F4:7C:C4:A7:8A:0E:B6:DE:A9:16:DF:AASHA1: CC:65:A3:74:85:C0:97:DA:95:AE:13:1F:CA:75:16:CA:66:F2:17:87SHA256: 22:86:56:6B:16:A1:D5:C1:66:CE:2B:A7:F5:CA:EC:74:45:C5:A5:3E:00:DD:F1:28:8F:F3:F8:7D:2B:00:25:A4Signature algorithm name: SHA256withRSAVersion: 3

Extensions:

#1: ObjectId: 2.5.29.14 Criticality=falseSubjectKeyIdentifier [KeyIdentifier [0000: 5D 34 8D B8 34 CF 0B A0 64 05 20 93 E4 BA 92 28 ]4..4...d. ....(0010: CD EB 20 64 .. d]]

**************************************************************************************

14. Import the Gateway/InfoArchive web application certificate into the InfoArchive server:

> copy c:\ia40\infoarchive\config\webapp\https\gateway_public_cert.certc:\ia40\infoarchive\config\server\https> cd c:\ia40\infoarchive\config\server\https> keytool -import -noprompt -trustcacerts -alias IA_GATEWAY_CERT -filegateway_public_cert.cert -keystore serverTrustStore.jks -storepass abcdefg

The generated file is named serverTrustStore.jks.

15. Configure the InfoArchive server for HTTPS mode. Edit the C:\ia4.0\infoarchive\config\server\application-ssl.yml file and set its content to:

server:ssl:key-store: "file:config/server/https/serverKeyStore.jks"key-store-password: abcdefgkey-password: abcdefgkeyStoreType: JKSkeyAlias: ia_server_certtrust-store: "file:config/server/https/serverTrustStore.jks"trust-store-password: abcdefgtrust-store-type: JKSclient-auth: want

133

Page 134: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

Note: client-auth: want is critical client-auth: need to mask the set inside the jar file.

16. Run the InfoArchive server in HTTPS mode:

> cd c:\ia40\infoarchive> .\bin\infoarchive-server-ssl.bat

17. Configure Gateway/InfoArchive web application for HTTPS mode:

spring:application:name: infoarchive.gateway

profiles:active: infoarchive.profile.HTTPS,infoarchive.gateway.profile.AUTHENTICATION_IN_MEMORY

cloud:config:enabled: false

infoarchive:gateway:host: localhostport: 8080contextPath: /token:secret: secret

server:host: ${infoarchive.gateway.host}port: ${infoarchive.gateway.port}contextPath: ${infoarchive.gateway.contextPath}

zuul:routes:restapi:path: /restapi/**sensitiveHeaders: ""url: http://localhost:8080/

addProxyHeaders: truehost:socket-timeout-millis: 60000

---spring:profiles: "infoarchive.profile.HTTPS"

server:ssl:key-store: "file:config/webapp/https/gatewayKeyStore.jks"key-store-password: abcdefgkey-password: abcdefgkeyStoreType: JKStrust-store: "file:config/webapp/https/gatewayTrustStore.jks"trust-store-password: abcdefgtrust-store-type: JKS

zuul:routes:restapi:path: /restapi/**sensitiveHeaders: ""url: "https://localhost:8080/"

18. Add the system properties to the bin\infoarchive-webapp.bat file:

134

Page 135: EMC InfoArchive 4.1 Configuration & Administration … EndUser ITOwner ... *IncludesTalend,Powercenter,Datastage(InfosphereInformationServer),Pentaho,ABInitio, Clover,DataIntegratorandBODataIntegrator

Administration

:@rem Execute infoarchive-webappcd "%APP_HOME%""%JAVA_EXE%" -Djavax.net.ssl.trustStore=%APP_HOME%\config\webapp\https\gatewayKeyStore.jks -Djavax.net.ssl.trustStorePassword=abcdefg%DEFAULT_JVM_OPTS% %JAVA_OPTS% %INFOARCHIVE_WEBAPP_OPTS% -jar"%CLASSPATH%" %CMD_LINE_ARGS%:

19. Run Gateway/InfoArchive web application in HTTPS mode:

> cd c:\ia40\infoarchive> bin\infoarchive-webapp.bat

135