275
SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion Agent by Itemfield. Future editions of this manual will reflect the new name, which replaces the name ContentMaster .

ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

  • Upload
    others

  • View
    39

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

SAP Conversion Agent byItemfield (ContentMaster)

ContentMaster StudioUser's Guide

Version 4.0

This product has been renamed as SAP Conversion Agent by Itemfield. Future editions ofthis manual will reflect the new name, which replaces the name ContentMaster.

Page 2: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

Legal Notice

ContentMasterStudio User's Guide

Copyright © 2004-2005 Itemfield Inc. All rights reserved.

Itemfield may have patents, patent applications, trademarks, copyrights, or other intellectual propertyrights covering subject matter in this document. Except as expressly provided in any written licenseagreement from Itemfield, the furnishing of this document does not give you any license to thesepatents, trademarks, copyrights, or other intellectual property.

The information in this document is subject to change without notice. Complying with all applicablecopyright laws is the responsibility of the user. No part of this document may be reproduced ortransmitted in any form or by any means, electronic or mechanical, for any purpose, without theexpress written permission of Itemfield Inc.

SAP AGhttp://www.sap.com

Publication Information:

Version: 4.0Date: January 2006

Page 3: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Contents

i

Contents

1. Designing ContentMaster Data Transformations...........................1Before You Continue .............................................................................................................................1Overview of Transformation Architecture .............................................................................................. 1

ContentMaster Components ...........................................................................................................2Data Holders ................................................................................................................................... 3Documents ...................................................................................................................................... 4ContentMaster Services ..................................................................................................................4

Project Architecture ............................................................................................................................... 4Required Project Files ..................................................................................................................... 5Additional Project Files.................................................................................................................... 5

Workflow for Designing Transformations............................................................................................... 6Analyzing the Inputs and Outputs ................................................................................................... 6Creating and Configuring a Project ................................................................................................. 7Deploying the Project as a ContentMaster Service......................................................................... 8

How to Learn More ................................................................................................................................ 8Online Samples ............................................................................................................................... 9Contacting Product Support ............................................................................................................ 9

2. Using ContentMaster Studio..........................................................10ContentMaster Studio for Eclipse ........................................................................................................10For Detailed Information ......................................................................................................................11

3. Parsers ............................................................................................12Creating a Parser ................................................................................................................................ 12

Using the New Parser Wizard ....................................................................................................... 12Creating a Parser by Editing the IntelliScript................................................................................. 16

Running a Parser.................................................................................................................................16Platform-Independent Parsers.............................................................................................................17Parser Component Reference.............................................................................................................18

Parser ............................................................................................................................................18Input Port Component Reference........................................................................................................21

DocList .......................................................................................................................................... 21FileSearch .....................................................................................................................................22LocalFile ........................................................................................................................................22Text ...............................................................................................................................................23URL ...............................................................................................................................................23

Page 4: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Contents

ii

4. Document Processors....................................................................25Installation ...........................................................................................................................................25Defining Document Processors ...........................................................................................................25

Display of Document Processor Output ........................................................................................ 26Document Processor Quick Reference ............................................................................................... 26Document Processor Component Reference......................................................................................27

AFPToXML....................................................................................................................................27ExcelToHtml ..................................................................................................................................27ExcelToTextML .............................................................................................................................27ExcelToTxt ....................................................................................................................................28ExcelToXml ................................................................................................................................... 28ExpandFrameSet ..........................................................................................................................28ExternalCOMPreProcessor ...........................................................................................................28ExternalJavaPreProcessor ............................................................................................................29ExternalPreProcessor ...................................................................................................................31PdfToTxt_3_00..............................................................................................................................32PowerpointToHtml.........................................................................................................................32PowerpointToTextML ....................................................................................................................33ProcessByTransformers ................................................................................................................33ProcessorPipeline .........................................................................................................................33RtfToTextML..................................................................................................................................33WordPerfectToTextML ..................................................................................................................33WordToHtml ..................................................................................................................................34WordToRtf .....................................................................................................................................34WordToTextML..............................................................................................................................34WordToTxt.....................................................................................................................................34WordToXml ................................................................................................................................... 34XmlToExcel ................................................................................................................................... 35

TextML XML Schema ..........................................................................................................................35

5. Formats ...........................................................................................36Defining the Document Format............................................................................................................36Standard Properties of Formats .......................................................................................................... 37Format Component Reference ............................................................................................................38

BinaryFormat.................................................................................................................................38CustomFormat............................................................................................................................... 39HtmlFormat....................................................................................................................................39RtfFormat ...................................................................................................................................... 40TextFormat ....................................................................................................................................40XmlFormat .....................................................................................................................................41

Delimiters Component Reference ....................................................................................................... 41CommaDelimited ...........................................................................................................................42DelimiterHierarchy.........................................................................................................................43HL7................................................................................................................................................44Positional....................................................................................................................................... 44PostScript ...................................................................................................................................... 45RTF ...............................................................................................................................................45SGML ............................................................................................................................................45SpaceDelimited .............................................................................................................................45TabDelimited .................................................................................................................................46

Page 5: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Contents

iii

Delimiter Subcomponent Reference....................................................................................................47Delimiter ........................................................................................................................................47EnclosingDelimiters .......................................................................................................................48

Format Preprocessor Component Reference......................................................................................49HtmlProcessor ............................................................................................................................... 49RtfProcessor..................................................................................................................................49

6. Data Holders ...................................................................................50XSD Schemas .....................................................................................................................................50

About XSD.....................................................................................................................................50Where To Learn XSD....................................................................................................................52How to Create XSD Schemas ....................................................................................................... 52Encoding of the XSD Schema....................................................................................................... 53Included XSD Files ........................................................................................................................ 53Namespaces .................................................................................................................................53Mixed Content ............................................................................................................................... 53Unsupported XSD Features .......................................................................................................... 53

Adding XSD Schemas to a ContentMaster Project .............................................................................54Adding an Existing Schema .......................................................................................................... 54Creating a New Schema ...............................................................................................................55Editing a Schema ..........................................................................................................................55Reloading a Schema after Editing .................................................................................................55

Viewing a Schema ............................................................................................................................... 56Schema View ................................................................................................................................ 56Displaying an XML Sample of a Schema ......................................................................................57

Using a Schema to Map Anchors ........................................................................................................58IntelliScript Representation of Data Holders ................................................................................. 58Mapping Mixed Content ................................................................................................................58

Generating Valid XML .........................................................................................................................60Role of XSD in Parsing..................................................................................................................60Role of XSD in Serialization .......................................................................................................... 61

Variables.............................................................................................................................................. 61User-Defined Variables ................................................................................................................. 62System Variables ..........................................................................................................................62Mapping Anchors to Variables ...................................................................................................... 63Using Variables in Actions.............................................................................................................64

Variable Component Reference .......................................................................................................... 64Variable .........................................................................................................................................64

Multiple-Occurrence Data Holders ...................................................................................................... 64

7. Anchors...........................................................................................66Marker and Content Anchors...............................................................................................................66Other Anchor Types ............................................................................................................................ 66How Anchors and Delimiters Work Together ......................................................................................66Mapping Content Anchors to Data Holders.........................................................................................67

Mapping to Variables..................................................................................................................... 68Mapping to Multiple-Occurrence Data Holders .............................................................................68Mapping to Mixed-Content Elements ............................................................................................ 69

Page 6: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Contents

iv

Defining Anchors .................................................................................................................................69Where to Define Anchors .............................................................................................................. 69Sequence of Anchors ....................................................................................................................70Select-and-Click Procedure for Marker and Content Anchors ......................................................71Drag-and-Drop Procedure for Content Anchors ............................................................................71Using the IntelliScript to Define Anchors .......................................................................................72

Standard Anchor Properties ................................................................................................................72How a Parser Searches for Anchors ................................................................................................... 73

Search Phases ..............................................................................................................................74Search Scope and Search Criteria................................................................................................ 75Adjusting the Search Phase.......................................................................................................... 76Adjusting the Search Scope.......................................................................................................... 76Adjusting the Search Criteria.........................................................................................................78Using XSD Data Types to Narrow the Search Criteria ..................................................................79Anchors that Contain Nested Anchors ..........................................................................................80

Anchor Quick Reference ..................................................................................................................... 81Anchor Component Reference ............................................................................................................82

Alternatives....................................................................................................................................82Content .......................................................................................................................................... 84DelimitedSections..........................................................................................................................87EmbeddedParser ..........................................................................................................................90EnclosedGroup..............................................................................................................................91FindReplaceAnchor .......................................................................................................................92Group ............................................................................................................................................94HtmlForm....................................................................................................................................... 96Marker ...........................................................................................................................................98RepeatingGroup.......................................................................................................................... 100

Searcher Component Reference ....................................................................................................... 104AttributeSearch............................................................................................................................ 105LearnByExample .........................................................................................................................106NewlineSearch ............................................................................................................................ 106OffsetSearch ............................................................................................................................... 106PatternSearch ............................................................................................................................. 107SegmentSearch........................................................................................................................... 107TextSearch .................................................................................................................................. 108TypeSearch................................................................................................................................. 109

Anchor Subcomponent Reference .................................................................................................... 110AddField ...................................................................................................................................... 110Connect ....................................................................................................................................... 110ImageClick................................................................................................................................... 111ModifyField .................................................................................................................................. 111RemoveField ............................................................................................................................... 112SegmentIndex ............................................................................................................................. 112SegmentSize ............................................................................................................................... 112SubmitAll ..................................................................................................................................... 113SubmitClick ................................................................................................................................. 113

Page 7: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Contents

v

8. Transformers ................................................................................114Defining Transformers....................................................................................................................... 114

Using Transformers in Anchors ................................................................................................... 114Sequences of Transformers........................................................................................................ 115Default Transformers................................................................................................................... 115Using Transformers as Document Processors............................................................................ 116Using Transformers in Serialization Anchors .............................................................................. 116Using Transformers in Actions .................................................................................................... 116Using Transformers as Runnable Components .......................................................................... 116

Standard Transformer Properties ...................................................................................................... 117Transformer Quick Reference ...........................................................................................................117Transformer Component Reference.................................................................................................. 120

AbsURL ....................................................................................................................................... 120AddEmptyTagsTransformer ........................................................................................................ 121AddString..................................................................................................................................... 121BidiConvert.................................................................................................................................. 122BigEndianUniToUni ..................................................................................................................... 122CDATADecode ............................................................................................................................ 122CDATAEncode ............................................................................................................................ 123ChangeCase ............................................................................................................................... 123CreateGuid .................................................................................................................................. 123DateFormat ................................................................................................................................. 124Dos96HebToAscii........................................................................................................................ 125EbcdicToAscii.............................................................................................................................. 125EncodeAsUrl ............................................................................................................................... 125Encoder ....................................................................................................................................... 126ExternalTransformer.................................................................................................................... 126FormatNumber ............................................................................................................................ 128FromBase64Transformer ............................................................................................................ 128FromFloat .................................................................................................................................... 129FromInteger................................................................................................................................. 129FromPackDecimal ....................................................................................................................... 130FromSignedDecimal.................................................................................................................... 130hebrewBidi................................................................................................................................... 130HebrewDosToWindowsTransformer ........................................................................................... 131HebrewEBCDICOldCodeToWindows .........................................................................................131hebUniToAscii ............................................................................................................................. 131hebUtf8ToAscii ............................................................................................................................ 131HtmlEntitiesToASCII.................................................................................................................... 131HtmlProcessor ............................................................................................................................. 132InjectFP ....................................................................................................................................... 132InjectString .................................................................................................................................. 132JavaTransformer .........................................................................................................................133LookupTransformer ..................................................................................................................... 134NormalizeClosingTags ................................................................................................................ 135ODBCLookup .............................................................................................................................. 135RegularExpression ...................................................................................................................... 136RemoveMarginSpace ..................................................................................................................137RemoveRtfFormatting ................................................................................................................. 137RemoveTags ............................................................................................................................... 138Replace ....................................................................................................................................... 138Resize ......................................................................................................................................... 139

Page 8: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Contents

vi

ReverseTransformer ................................................................................................................... 139RtfProcessor................................................................................................................................ 139RtfToASCII .................................................................................................................................. 140SubString..................................................................................................................................... 140ToBase64Transformer ................................................................................................................ 140ToFloat ........................................................................................................................................ 141ToInteger..................................................................................................................................... 141ToPackDecimal ........................................................................................................................... 142ToSignedDecimal ........................................................................................................................ 142TransformByParser ..................................................................................................................... 142TransformerPipeline .................................................................................................................... 144WestEuroUniToAscii ................................................................................................................... 144XSLTTransformer........................................................................................................................ 144

Transformer Subcomponent Reference ............................................................................................ 145InlineTable ................................................................................................................................... 145ODBC_Text_Connection............................................................................................................. 145XMLLookupTable ........................................................................................................................ 146

9. Actions ..........................................................................................147How Actions Work ............................................................................................................................. 147

Comparison between Actions and Transformers ........................................................................ 148Defining Actions................................................................................................................................. 148Standard Action Properties................................................................................................................ 148Action Quick Reference..................................................................................................................... 149Action Component Reference ...........................................................................................................151

AddEventAction........................................................................................................................... 151AppendListItems.......................................................................................................................... 151AppendValues ............................................................................................................................. 152CalculateValue ............................................................................................................................ 153CombineValues ........................................................................................................................... 154CreateList .................................................................................................................................... 155DateAdd ...................................................................................................................................... 156DateDiff ....................................................................................................................................... 157DownloadFile............................................................................................................................... 158DownloadFileToDataHolder ........................................................................................................ 159DumpValues ................................................................................................................................ 159EnsureCondition.......................................................................................................................... 160ExcludeItems............................................................................................................................... 162ExternalCOMAction ..................................................................................................................... 162JavaScriptFunction ...................................................................................................................... 166Map ............................................................................................................................................. 167ODBCAction ................................................................................................................................ 168ResetVisitedPages ...................................................................................................................... 169RunMapper.................................................................................................................................. 170RunParser ................................................................................................................................... 170RunSerializer ............................................................................................................................... 172SetValue...................................................................................................................................... 173SubmitForm................................................................................................................................. 174SubmitFormGet ........................................................................................................................... 175WriteValue ................................................................................................................................... 176XSLTMap .................................................................................................................................... 177

Page 9: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Contents

vii

Action Subcomponent Reference...................................................................................................... 178COMClass ................................................................................................................................... 178MSMQOutput .............................................................................................................................. 179ODBC_XML_Connection ............................................................................................................ 179OpenURL .................................................................................................................................... 180OutputCOM ................................................................................................................................. 180OutputDataHolder ....................................................................................................................... 182OutputFile .................................................................................................................................... 183ResultFile .................................................................................................................................... 183

10. Serializers....................................................................................184Creating a Serializer from a Parser ................................................................................................... 184

Controlling How the Create Serializer Command Works ............................................................ 186Troubleshooting an Auto-Generated Serializer ...........................................................................187

Creating a Serializer by Using the New Serializer Wizard................................................................. 189Creating a Serializer by Editing the IntelliScript................................................................................. 190Creating a Serializer within a RunSerializer Action ...........................................................................191Running a Serializer .......................................................................................................................... 191Serialization Anchors .........................................................................................................................191

Example of Serialization Anchors................................................................................................ 192Defining Serialization Anchors .................................................................................................... 193Sequence of Serialization Anchors ............................................................................................. 193

Standard Serializer Properties...........................................................................................................194Serializer Quick Reference................................................................................................................ 194Serializer Component Reference ...................................................................................................... 195

Serializer ..................................................................................................................................... 195Serialization Anchor Component Reference ..................................................................................... 196

AlternativeSerializers................................................................................................................... 196ContentSerializer .........................................................................................................................197DelimitedSectionsSerializer......................................................................................................... 197EmbeddedSerializer .................................................................................................................... 200GroupSerializer ........................................................................................................................... 201RepeatingGroupSerializer ...........................................................................................................202StringSerializer ............................................................................................................................ 204

11. Mappers....................................................................................... 205Creating a Mapper ............................................................................................................................. 205Creating a Mapper within a RunMapper Action................................................................................. 206Components Nested within a Mapper ............................................................................................... 206Mapper Example ............................................................................................................................... 206Running a Mapper ............................................................................................................................. 208Standard Mapper Properties ............................................................................................................. 208Mapper Quick Reference................................................................................................................... 208Mapper Component Reference ......................................................................................................... 209

Mapper ........................................................................................................................................ 209

Page 10: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Contents

viii

Mapping Anchor Component Reference ........................................................................................... 210AlternativeMappings.................................................................................................................... 210EmbeddedMapper....................................................................................................................... 211GroupMapping............................................................................................................................. 212RepeatingGroupMapping ............................................................................................................ 213

12. Locators, Keys, and Indexing ....................................................214Example of Locators.......................................................................................................................... 215Example of Indexing by Key .............................................................................................................. 217Source and Target Properties ...........................................................................................................220

Source Property .......................................................................................................................... 221Target Property ........................................................................................................................... 226

Standard Locator and Key Properties ............................................................................................... 228Locator and Key Component Quick Reference ................................................................................. 228Locator and Key Component Reference ........................................................................................... 228

Key .............................................................................................................................................. 229Locator ........................................................................................................................................ 232LocatorByKey .............................................................................................................................. 232LocatorByOccurrence..................................................................................................................233

13. Project Properties.......................................................................235Properties versus Preferences .......................................................................................................... 235Setting the Project Properties............................................................................................................ 235Info Page ...........................................................................................................................................236Authentication Page .......................................................................................................................... 236Encoding Page .................................................................................................................................. 237External Tools Builders Page ............................................................................................................ 240Namespaces Page ............................................................................................................................ 240Output Control Page.......................................................................................................................... 241Project References Page................................................................................................................... 242XML Generation Page....................................................................................................................... 242

14. Running and Testing Projects ................................................... 245Color-Coding the Example Source .................................................................................................... 245Running in ContentMaster Studio...................................................................................................... 246

If the Output File is not Displayed ............................................................................................... 247Running on Additional Source Documents..................................................................................247

Viewing the Event Log....................................................................................................................... 248Event-Log Properties................................................................................................................... 248Event Display Preferences .......................................................................................................... 248Understanding the Event Log...................................................................................................... 249Using Named Components ......................................................................................................... 250Cross-Identifying Events ............................................................................................................. 251Effect of Failure Events ............................................................................................................... 251Opening a ContentMaster Engine Event Log .............................................................................. 252

Page 11: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Contents

ix

15. Deploying ContentMaster Services...........................................253Runnable Components ...................................................................................................................... 253Deploying a Service........................................................................................................................... 253

Setting the ContentMaster Repository Location .......................................................................... 254Preparing a Project for Deployment ............................................................................................ 254Deployment Procedure................................................................................................................ 254Updating a Deployed Service...................................................................................................... 256Removing a Deployed Service .................................................................................................... 256

Running a Service ............................................................................................................................. 256

Index.................................................................................................. 257

Page 12: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 1. Designing ContentMaster Data Transformations

1

Designing ContentMaster DataTransformations

ContentMaster Studio is the design and configuration environment of theContentMaster system. Using ContentMaster Studio, you can design andimplement transformations that operate on any kind of data.

This book, the ContentMaster Studio User's Guide, is a complete learning andreference manual for designing ContentMaster transformations. The book contains:

Instructions for using the ContentMaster Studio tools and windows

Explanations of the ContentMaster concepts

Details on how to use all the ContentMaster components, such as parsers,serializers, transformers, mappers, anchors, and actions

Examples and tips on how to design transformations that work with manydifferent kinds of input and output

Instructions for deploying transformations that you have designed inContentMaster Studio to the ContentMaster Engine runtime environment

Before You Continue

Before you continue in this book, we recommend that you read or skim the bookGetting Started with ContentMaster. The Getting Started book introduces theContentMaster concepts and working methods.

ContentMaster Studio is hosted in the IBM Eclipse development environment. Youshould examine the book ContentMaster Studio in Eclipse, to learn about thewindows, menus, toolbars, etc., that are available in Eclipse.

Overview of Transformation Architecture

When you construct a ContentMaster transformation, you build it in modularfashion from components of the ContentMaster system. The components arearranged in a hierarchy or tree, which you can view in the IntelliScript editor ofContentMaster Studio.

1

Page 13: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 1. Designing ContentMaster Data Transformations

2

The components work with input and output documents, and with the data holdersthat store ContentMaster data.

This section provides a brief overview of the components and terminology that areused in this architecture. For detailed information about each component type, seethe following chapters of this book.

ContentMaster Components

Top-Level ComponentsAt the top level of the hierarchy, a ContentMaster transformation can run a parser,serializer, mapper, or transformer. These components are defined as follows:

ParserA components that converts source documents, which can be in any format, toXML.

SerializerA component that converts XML documents to output documents, which canbe in any format.

TransformerA component that modifies data. The input and output can be in any format.

MapperA component that converts XML documents to a different XML structure orschema.

Of these four component types, parsers and serializers are the most powerful andgenerally useful. By running a parser and serializer in sequence, for example, youcan convert any format to any other format. Using these components, you canperform conversions of unlimited complexity.

As top-level components, transformers are useful for relatively simple dataconversions, such as replacing predefined strings. Usually, the input and outputdocuments have the same, non-XML format. Because of this limitation,transformers are more often used as nested components, and not as top-levelcomponents (see below).

Please notice the distinction between transformation, which is the generic term for theoperations that ContentMaster performs on data, and transformer, which is a specific typeof ContentMaster component.

Nested ComponentsWithin a parser, serializer, or mapper component, you can nest components suchas:

FormatsDefine the overall format of documents, such as the delimiters thatContentMaster should use to interpret the documents.

Page 14: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 1. Designing ContentMaster Data Transformations

3

Document processorsOperate on a document as a whole, performing preliminary conversionsbefore parsing, or final operations after serializing.

AnchorsDefine the data in a source document, which a parser should process andextract. The anchors specify how a parser should search for the data, andwhere it should store the data that it extracts.

Serialization anchorsDefine how a serializer should write XML data to an output document.Serialization anchors are the inverse of anchors; an anchor writes data from asource document to XML, whereas a serialization anchor writes data fromXML to an output document.

Mapping anchorsDefine how a mapper should write XML data to another XML structure orschema. The anchors specify where to find the data in the source XML, andwhere to write the data in the output XML.

ActionsPerform operations on data in the scope of a data transformation, for example,concatenating strings that a parser has extracted from a source document,summing numbers that a serializer finds in an XML input document, orquerying a database for additional data.

TransformersIn addition to their use as top-level components, you can nest transformerswithin a parser or a serializer. For example, within a parser, you can nest atransformer that modifies the output of the anchors.

As a nested component, a transformer operates on a portion of a document,and not on the complete document.

Indirectly, you can also nest parsers and serializers within each other. For example,within a parser, you can nest an action that runs another parser on a portion of thesame document or on a second document.

Subcomponents

In addition to the main components that are described above, ContentMaster has alarge number of subcomponents, which are used for special purposes within themain components. It is also possible to develop custom components, such as customdocument processors or custom transformers, to serve special needs.

Data Holders

Data holders are the XML elements, XML attributes, and variables that yourtransformations use for data storage.

The elements and attributes are defined in XSD schemas, which are standard XMLschema definitions. ContentMaster uses XSD to define data holders, to help itprocess XML input, and to help it construct valid XML output.

Page 15: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 1. Designing ContentMaster Data Transformations

4

The variables are defined in the ContentMaster configuration, using XSD datatypes.

Data holders and XSD schemas are described in Chapter 6 of this book, DataHolders.

Documents

The input and output of a ContentMaster transformation are called documents.

A document can have any size. It can contain any text or binary data. It can bestored or accessed in a file, buffer, stream, database, messaging system, or anyother location.

The input and output of a data transformation are called the source document andthe output document.

For a parser, the source document can have any format. A special source documentis the example source, which is the document that you use to teach the parser how toprocess other source documents. The output document of a parser has an XMLformat.

The source document of a serializer is XML, and the output document can haveany format. For a mapper, both the source and the output are XML. For atransformer, the source and output can have any format.

XML is the common language, which connects ContentMaster transformationstogether. For example, you can run a parser that converts a source document fromany format to XML, and a serializer that converts the XML to any output format.By chaining the parser and serializer together, you can convert any input format toany output format.

Document formats are described in Chapter 5 of this book, Formats.

ContentMaster Services

To configure a ContentMaster transformation, you use ContentMaster Studio.Then you should deploy the transformation as a ContentMaster service. This letsContentMaster Engine execute the transformation.

A ContentMaster service can run a parser, serializer, mapper, or a transformer asits top-level component.

For more information, see Chapter 15, Deploying ContentMaster Services.

Project Architecture

A ContentMaster data transformation is stored in a project. Each project has aproject folder, which contains the project files.

Page 16: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 1. Designing ContentMaster Data Transformations

5

Required Project Files

There are two required types of files in a project:

CMW fileThe CMW file (which has a *.cmw filename) is the main project configurationfile. Every project contains exactly one CMW file.

TGP filesA TGP file (which has a *.tgp filename) contains a script, which defines theparsers, serializers, etc. that perform a data transformation. A project cancontain one or more TGP files.

The scope of a TGP file is the entire project. This means, for example, that aparser in one TGP file can use a variable that is defined in another TGP file.

You can import or copy the TGP files between projects. You can organize theproject components in different TGP files.

Additional Project Files

In addition to the CMW and TGP files, a project can contain many other files andfolders, for example:

XSD schemasMost ContentMaster transformations require an XSD schema, which definesthe XML elements and attributes that the transformation can use. Parsers,serializer, and mappers use XSD schemas to define the structure of their XMLoutput or input.

The main schema is usually stored in the top-level project folder. If the mainschema includes other schema files, they are stored in a subfolder.

Results folderAs you develop and test a project, ContentMaster Studio creates a subfolder,within the project folder, where it stores the parser or serializer output. Bydefault, the name of the subfolder is Results.

When you deploy a service and run it in ContentMaster Engine, you cancontinue to use the Results folder, or you can instruct the service to store itsoutput in other locations.

Example source documentTo design a parser, you typically select and configure the text in an examplesource document, which is a sample of the input that you want the parser toprocess.

You can provide the example source as a file, a text string, or any otherdocument type that ContentMaster can process (see Documents above). If theexample source is a file, we recommend that you store it in the project folder.This helps keep the example source together with the project.

Page 17: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 1. Designing ContentMaster Data Transformations

6

Test documentsIn addition to the example source, you can store other input documents thatyou use to test a parser or serializer.

Any other desired filesThe project folder is a convenient location to store any other files or foldersthat are associated with a project, for example, documentation or readme files.

Workflow for Designing Transformations

The following procedure is a summary of a typical workflow for usingContentMaster Studio and developing a transformation. The example is aworkflow that you might use to configure a new parser or serializer.

The main workflow steps are:

1. Analyze the transformation requirements, such as the required inputs andoutputs.

2. Create and configure a ContentMaster project that implements thetransformation.

3. Deploy the transformation as a ContentMaster service, which runs inContentMaster Engine.

The following paragraphs provide more information on the steps.

Analyzing the Inputs and Outputs

The first step in any design or development project is to examine the inputs andoutputs carefully, and determine how they are related.

For a parser, some of the analysis steps are to:

Examine whether the document structure is amenable to parsing. Sometimes,a simple step such as converting the document to an alternative format (whichyou can do by applying a ContentMaster document processor) makes thedocument much easier to parse.

Plan which data you need to extract from the source document, and whereyou will insert the extracted data in the XML. In ContentMaster, youimplement the data extraction by using the Content anchor.

Analyze the structure of the source document and identify the features youcan use to locate the data fields. In ContentMaster, these features translate toMarker anchors, or to various other types of anchors.

Find repetitive or structured features of the documents that may help extractthe data. You can implement such features using anchors such as Group,EnclosedGroup, RepeatingGroup, or DelimitedSections.

Page 18: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 1. Designing ContentMaster Data Transformations

7

Decide whether you need to transform any of the data during or after theextraction process. The ContentMaster components that operate on theextracted data are transformers and actions.

Determine whether there are any additional data sources, such as a linkeddocument or a database, that you need to access to prepare the output. Youcan access such data by using certain anchors, transformers, or actions.

For a serializer or a mapper, you can invert the steps: plan the data that you needto extract from the source XML, and where to insert it in the output document. It isusually easier to design a serializer or a mapper than a parser because the inputdata is fully structured XML.

Creating and Configuring a Project

After you analyze the inputs and outputs, it is time to implement the processingsteps in ContentMaster. A typical procedure is as follows:

1. Create a new ContentMaster project.

2. Add one or more XSD schemas to the project. The schemas must define theXML elements and attributes with which you will work.

3. Create a parser, serializer, etc., and define its properties.

For a parser, you must define an example source document and a formatcomponent. The format component can contain features such as a delimitersdefinition and transformers.

For a serializer, there are no required properties that you must set, but thereare some optional properties such as the extension of the output filename. Youcan also create a serializer automatically, by asking ContentMaster Studio toinvert a parser that you have already created.

The parser or serializer is displayed in the IntelliScript editor of ContentMasterStudio. The example source is displayed in the example pane of theIntelliScript editor.

4. Configure the subcomponents of the parser or serializer.

For a parser, the main components are anchors. You can define the anchors byselect-and-click or drag-and-drop procedures, or you can edit the IntelliScript.ContentMaster Studio helps you do this by highlighting and color-coding theanchors in the example source.

For a serializer, the main components are serialization anchors, which you cancreate in the IntelliScript.

5. Use the ContentMaster Studio tools to test and execute the datatransformation. Correct any configuration errors that you detect during thetesting.

Page 19: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 1. Designing ContentMaster Data Transformations

8

Deploying the Project as a ContentMaster Service

When the parser or serializer is operating correctly, use the Deploy command ofContentMaster Studio to deploy it as a ContentMaster service, which can run inContentMaster Engine.

How to Learn More

As you continue in this book, you will learn the details of all the ContentMasterconcepts and procedures. The chapters are organized more or less according to theworkflow steps, which are outlined above.

Chapter Explains how to:

2. Using ContentMaster Studio Use ContentMaster Studio in the IBM Eclipse developmentenvironment

3. Parsers Create a parserConfigure parser features such as the example source

4.Document Processors Configure components that pre-process a document beforeparsing, or post-process a document after serializing

5. Formats Configure the format component of a parserDefine format components such as delimiters and defaulttransformers

6. Data Holders Use XSD schemas to define element and attribute data holdersDefine variables

7. Anchors Define the anchors that a parser uses to extract data from asource document

8. Transformers Use transformers to modify the data

9. Actions Use actions to operate on the data

10. Serializers Create a serializerDefine the serialization anchors that write the serialized data to anoutput document

11. Mappers Create a mapperDefine the mapping anchors that transfer data between the inputand output XML

12. Locators, Keys, andIndexing

Construct data transformations that retrieve, match, and storerecords by sequential, non-sequential, or keyed criteria

13. Project Properties Configure the properties of a project, such as the input and outputencoding and the XML validation features

Page 20: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 1. Designing ContentMaster Data Transformations

9

Chapter Explains how to:

14. Running and TestingProjects

Test and troubleshoot a data transformation

15. Deploying ContentMasterServices

Deploy a data transformation as a service that runs inContentMaster engine

Online Samples

As you use this book, you can view online samples that illustrate manyContentMaster features. The samples are ContentMaster projects, which arelocated in the Samples folder under your main ContentMaster installation folder(by default, c:\Program Files\SAP\ContentMaster\samples). If you do not have aSamples folder, run the ContentMaster setup again and select the option to installthe samples.

To view the samples, you should import the projects to ContentMaster Studio. Forinstructions on how to do this, see the book ContentMaster Studio in Eclipse.

In addition to the project samples, you can find sample program code for featuressuch as custom processors and transformers.

Contacting Product Support

If you experience any problems or have questions about ContentMaster, youshould first check the documentation for information.

If you do not find the information that you need, please contact SAP support. Ifyou have a question about a particular project, please send a zip file containing:

The project folder

Any additional required files, such as the example source document

The execution log file

Page 21: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 2. Using ContentMaster Studio

10

Using ContentMaster Studio

ContentMaster Studio is the design and configuration environment ofContentMaster. You use it to develop and edit ContentMaster projects.

If you have done the exercises in the book Getting Started with ContentMaster, youalready have experience using ContentMaster Studio. The exercises teach manyaspects of the ContentMaster Studio operation.

ContentMaster Studio for Eclipse

ContentMaster Studio runs as a plug-in module, which is hosted in the IBM Eclipsedevelopment environment.

Eclipse is a versatile platform, which IBM has designed to support both Javadevelopment and plug-in development tools. ContentMaster Studio worksseamlessly within Eclipse, letting you develop ContentMaster projects for allpurposes.

The Eclipse platform is supplied at no additional cost with the ContentMastersoftware. You do not need any previous experience with Eclipse in order to useContentMaster Studio.

2

Page 22: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 2. Using ContentMaster Studio

11

ContentMaster Studio for Eclipse

For Detailed Information

For detailed information and complete instructions on using ContentMaster Studiowithin the Eclipse environment, please see the manual ContentMaster Studio inEclipse. That manual explains how to all the ContentMaster Studio features, such asthe views, editors, menus, and toolbars.

The ContentMaster Studio in Eclipse manual does not discuss the functionalcomponents that you can insert in ContentMaster projects, such as parsers,serializers, data holders, anchors, transformers, etc. For that information, pleaseread the following chapters of this book.

Page 23: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

12

Parsers

Parsers are the ContentMaster components that convert a source document toXML.

The output of a parser is always XML. The input can have any format, for example,text, HTML, Word, PDF, or HL7. The input can even be an XML document, whichthe parser processes as string data (this is different from a serializer or a mapper,which treats its input as an XML object).

This chapter provides basic instructions on how to create and run a parser. Thedetails, such as how to support specific document formats, how to define theanchors that process the text of a source document, etc., are in the succeedingchapters.

For tutorial exercises on how to define and run a parser, see the book GettingStarted with ContentMaster.

Creating a Parser

You can create a parser by either of the following methods:

By using the New Parser wizard, or

By editing the IntelliScript and inserting a Parser component

Using the New Parser Wizard

The easiest way to create a parser is by using the New Parser wizard. The wizardlets you set up the basic parser configuration, with typical options.

After you finish the wizard, you can edit the parser properties as required. Nestedwithin the parser, you can insert components such as document processors,anchors, transformers, and actions, which perform the parsing operations. Thefollowing chapters of this book explain how to do this.

Opening the Wizard

To create a new project that contains a parser, choose File > New > Project on theContentMaster Studio menu. In the left pane of the New Project window, selectContentMaster. In the right pane, select Parser Project.

3

Page 24: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

13

To create a new parser in an existing project, choose File > New > Parser on themenu.

On each page of the wizard, enter the desired options. Click the Next button toadvance to the following page. At any stage, you may click Finish to skip thesubsequent pages and accept the defaults for those pages.

Wizard OptionsThe New Parser wizard prompts you for options such as the following:

Project nameAn identifier for the project.

Project contentsThe storage location of the project folder. The default is the Eclipse workspacefolder.

Parser nameA name for the parser.

Script nameA name for a TGP script file, where the wizard stores the parser definition.

Schema file path(Optional) The name of an XSD schema, which defines the data holders wherethe parser will store its output.

If you omit this step, you can add an XSD schema to the project afterwards, byright-clicking in the ContentMaster Explorer view.

Source type, source pathSelect the example source document, which you will use to configure theparser.

The example source should illustrate all the features (or as many features aspossible) that you expect the parser to process in production documents.

Choose the example source carefully. After you configure a parser, it may bedifficult to change the example source.

You may select the following types of example source:

File: Browse to a file on the local computer or network.URL: Specify the URL of a document on a network or web site.Text: Type a text string, which the parser will use as an example source. You

might use this option, for example, for short source documents that consistof a single text line.

None: Do not use an example source. In that case, you must configure theanchors completely in the IntelliScript, instead of parsing by example.

Content typeSelect the content type of the source documents, for example, ASCII or Binary.If you don't find the exact document type, you can refine your choice later byediting the IntelliScript.

Page 25: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

14

Document preprocessor(Optional) Select a document processor, which converts the source documentsto a type that is amenable for parsing.

The wizard suggests processors that seem appropriate for the content type.For example, if you select a Microsoft Word example source, the wizardsuggests processors that converts Word documents to text, HTML, or XMLformats.

Afterwards, you can edit the IntelliScript and select among the full set ofprocessors. For more information, see Chapter 4, Document Processors.

FormatSelect the format of the source documents, for example, tab-delimited orHTML.

The wizard suggest formats that seem appropriate for the content type. If youselected a document processor, the format is that of the processor output,rather than the original source document.

After you complete the wizard, you can edit the IntelliScript and select amongthe full range of formats. For information, see Chapter 5, Formats.

Completing the Parser Configuration

After you have entered the wizard options, click Finish to create the parser. Theparser is displayed in the ContentMaster Explorer view and the Component viewof ContentMaster Studio.

ContentMaster Explorer view

Page 26: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

15

Component view

To complete the parser configuration:

1. Display the parser in the IntelliScript editor. You can do this by double-clicking the parser name in the Component view, or by double-clicking theTGP script file in the ContentMaster Explorer.

IntelliScript editor

2. Under the contains line, add a sequence of nested anchors and actions. Fordetailed instructions, see the following chapters of this book.

Page 27: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

16

3. Run and test the parser (see Running a Parser below), and modify theIntelliScript as required.

Creating a Parser by Editing the IntelliScript

Instead of using the New Parser wizard, you can create a parser by editing theIntelliScript directly. The result is identical to a parser created by using the wizard.

1. At the top (global) level of the IntelliScript, select the three-dots (...) symbol.Press Enter and type a name for the parser.

2. To the right of the name, press Enter. Select a Parser component from thedrop-down list.

3. Expand the tree under the Parser component. Assign its properties, such asthe example_source and the format. For an explanation of the properties, seethe Parser Component Reference below.

4. Under the contains line, add a sequence of nested anchors and actions. Fordetailed instructions, see the following chapters of this book.

5. Run and test the parser (see Running a Parser below), and modify theIntelliScript as required.

Running a Parser

To run a parser in ContentMaster Studio, follow this procedure. For additionalinformation, see Chapter 14, Running and Testing Projects.

1. In the IntelliScript editor or in the Component view, right-click the parser thatyou want to run, and choose Set as Startup Component.

Alternatively, you can set the startup component in the Run > Run command.

2. On the ContentMaster Studio menu, choose the Run > Run command or theRun > Run MyParser command (where MyParser is the name of the parserthat you have set as the startup component).

3. After a few seconds, ContentMaster Studio displays the Events view. Examineit for any warnings, failures, or errors.

Events view

Page 28: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

17

4. To display the parsing results, double-click the file Results\output.xml in theContentMaster Explorer view.

If you are using Windows XP SP2 or higher, Internet Explorer may display a warningabout active content in the XML results file. You can safely ignore the warning.

Platform-Independent Parsers

ContentMaster runs on both Windows and Unix systems. Most parser features runequally well on both platforms.

There are a few exceptions to this rule, however. If you plan to run a parser onboth Windows and Unix, here are a few tips that can help ensure platformindependence.

Document ProcessorsUse document processors that do not have platform-specific system requirements.For information, see Chapter 4, Document Processors.

For example, use the ExcelToXml processor instead of ExcelToHTML . The former isplatform-independent; the latter requires that Microsoft Excel be installed on thecomputer.

Custom ComponentsContentMaster supports custom document processors, transformers, and actions.You should use platform-independent versions of the custom components, such as:

ExternalJavaPreProcessor (programmed in Java)

ExternalPreProcessor or ExternalTransformer (programmed in C++ andcompiled for both Windows and Unix)

Do not use ExternalCOMPreProcessor and ExternalCOMAction components, whichare supported only on Windows.

Page 29: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

18

Newline Markers

When defining Marker anchors, avoid platform-specific newline sequences such as\n\r (a newline character followed by a carriage return character, which iscommonly used in Windows).

Instead, configure a Marker with the built-in NewlineSearch component, whichsearches for both the \n\r sequence and the \n or \r character alone.

Encoding

Confirm that the input, output, and working encoding are supported on theplatforms. For lists of the supported encodings, see Chapter 13, Project Properties.

File Paths

Use relative (as opposed to absolute) file paths. Remember that file paths on Unixare case-sensitive.

Parser Component Reference

The main Parser component is documented in this section. For subcomponentssuch as input ports, see the following sections of this chapter.

Parser

A Parser is a component that converts a source document to XML.

A Parser contains many nested components. Directly under the contains line ofthe Parser , you can nest anchors and actions. In other locations, under variousParser properties, you can assign components such as formats, delimiters,document processors, and transformers. For detailed information about all thesecomponents, see the following chapters of this book.

ExampleThe following is an example of a parser that processes tab-delimited textdocuments. For a complete description of this parser, see the chapter Basic ParsingTechniques in the book Getting Started with ContentMaster.

Page 30: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

19

Basic Propertiesexample_source

The example source document, which you use to configure the parseroperation. The document should be representative of the source documentsthat the parser will process. (For a discussion of the kinds of documents thatContentMaster can process, see Documents in Chapter 1, DesigningContentMaster Data Transformations.)

The value of the property is an input port such as LocalFile (a file on yourcomputer) or Text (a text string).

When you create a new Parser, you can assign this property in the New Parserwizard. Alternatively, you can edit the property in the IntelliScript.

Nested within this property, you can assign a preprocessor, which convertsthe source documents to a format that the parser can accept (see Chapter 4,Document Processors).

If you leave the example_source property blank, the parser does not use anexample source. In that case, you must configure the anchors completely in theIntelliScript, instead of learning by example.

formatThis property lets you specify the format of the source document, such aswhether the document contains text, HTML, or binary code; the delimitersthat separate data fields in the document; a format preprocessor thatContentMaster should apply to the document; and transformers that theparser should apply by default to all content anchors in the parser.

When you create a new Parser, you can assign this property in the New Parserwizard. You can customize the format by editing the IntelliScript.

For details of the available format components, see Chapter 5, Formats.

Page 31: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

20

Advanced Properties

reject_recurring_pagesIf selected, the parser does not parse the same page twice in the sameexecution. This is useful, for example, if a parser is following the links on aweb site, and you want to prevent it from parsing duplicate links to the samepage.

The ResetVisitedPages action resets the history list and allows a parser toprocess a page again, even if reject_recurring_pages is selected (see Chapter9, Actions). You might do this, for example, if you want to post different inputdata to the same web page.

no_initial_phaseIf selected, the parser runs without an initial phase. Components that areconfigured to run in the initial phase run in the main phase, instead (see How aParser Searches for Anchors in Chapter 7, Anchors).

sources_to_extractA specification of the source documents that the parser should process.

The value of the property is an input port such as LocalFile (a file on yourcomputer) or Text (a text string). To specify multiple sources, you can assignthe value DocList (a list of files) or FileSearch (a wildcard search for files).

If you assign sources_to_extract, and you run the parser in ContentMasterStudio, the parser processes the specified documents. If you leavesources_to_extract blank, the parser processes the example_source.

When you deploy a parser as a service, an application that runs the service canoverride sources_to_extract. See the ContentMaster Engine Developer's Guidefor details.

serialization_modeThis property specifies how the Create Serializer command should process theportions of the example source that the parser does not output to XML, whenyou create a serializer from a parser. For a full explanation, see Controlling Howthe Create Serializer Command Works in Chapter 10, Serializers.

The possible values of the serialization_mode are:

serialization_mode Explanation

Full The Create Serializer command copies the non-XML text to theserializer configuration.

Outline The Create Serializer command copies only the delimiters of thenon-XML text to the serializer configuration.Under the Outline option, you can select the use_markersoption. This causes the Create Serializer command to copy thecontent of the Marker anchors but only the delimiters of other non-XML text.

Page 32: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

21

nameA name that you assign to the parser. The name is used in the event log.

remarkA comment describing the parser.

example_valuesThis property contains simulated values (ExampleValue components) thatanother parser might pass to this parser. The property is useful whendesigning a parser that is to be activated by another parser. ContentMasteruses the property only when it learns the example source; it ignores theproperty when it parses a source document.

Under this property, specify the data holders (see Chapter 6, Data Holders) thatthe main parser passes to this parser, and their simulated values.

The following properties are useful in situations where the parser must selectspecific occurrences of data holders. For an explanation, see Chapter 12, Locators,Keys, and Indexing.

source

target

Online Samples

In the Samples folder, you can find many online samples of parsers.

For online tutorial exercises on creating parsers, see Getting Started withContentMaster.

Input Port Component Reference

An input port is a component that specifies an input to ContentMaster, such as asource document. The input can be a document that is stored on the localcomputer, on a network, or in a string. The input can also be a list of documents.

In a Parser, the values of the example_source and sources_to_extract propertiesare input ports.

DocList

A document list.

This input port is used, for example, in the sources_to_extract properties of aParser. It lets you specify multiple source documents that the parser shouldprocess.

Within this component, you can nest multiple input ports such as FileSearch,LocalFile, or Text, each of which specifies a single file.

Page 33: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

22

FileSearch

The criteria for a file search.

This input port is used, for example, in the sources_to_extract properties of aParser. It lets you specify source documents using wildcards.

Basic Properties

directoryThe folder to be searched.

wildcardThe search criterion. You may use * as a wildcard character. For example,*.txt finds all text files. The default is *.*, which finds all files in thedirectory.

Advanced Properties

recursiveIf selected, the search includes subfolders of the specified directory.

pre_processorThe name of a preprocessor that the parser should apply to the files (seeChapter 4, Document Processors).

LocalFile

A file on the local computer.

This input port is used, for example, in the example_source andsources_to_extract properties of a Parser.

Basic Properties

file_nameBrowse to the file.

Advanced Propertiessimulated_url

A URL that ContentMaster should assign to the file. This property instructsContentMaster to treat the file as if it were located on a web server. If the filecontains relative links, ContentMaster resolves the links relative to the URL.

pre_processorThe name of a preprocessor that the parser should apply to the files (seeChapter 4, Document Processors).

Page 34: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

23

Text

A text string.

This input port is used, for example, in the example_source andsources_to_extract properties of a Parser.

Basic Properties

quoteThe text string.

Advanced Propertiessimulated_url

A URL that ContentMaster should assign to the string. This property instructsContentMaster to treat the string as if it were a file located on a web server. Ifthe string contains relative links, ContentMaster resolves the links relative tothe URL.

pre_processorThe name of a preprocessor that the parser should apply to the files (seeChapter 4, Document Processors).

sizeA static size for the text buffer. This property is typically used when workingwith binary sources. The default is -1, which means that the buffer isdynamically sized.

URL

The URL of a document that is available on a web server.

This input port is used, for example, in the example_source andsources_to_extract properties of a Parser.

Basic Propertiesstable_url

The URL address, for example, http://www.example.com/index.html.

Advanced Properties

post_dataData that the parser should post to the URL. To determine the correct formatof the data string, you can use the technique described in the SubmitFormaction.

retriesIf the parser cannot access the URL on the first attempt, the number of retriesthat it performs before reporting a failure (default = 0).

Page 35: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 3. Parsers

24

seconds_to_waitThe number of seconds to wait between retries (default = 60).

pre_processorThe name of a preprocessor that the parser should apply to the files (seeChapter 4, Document Processors).

Page 36: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 4. Document Processors

25

Document Processors

Document processors are components that convert the format of a completedocument, to another format that is desired for processing.

You can use a document processor as a pre-processor, which converts the format of asource document prior to parsing. For example, if the source document is in thePDF format, you might apply the PdfToTxt_3_00 processor. This converts thesource document to text, which is much easier to parse than the binary PDFformat.

Do not confuse document processors with format preprocessors, which are described inChapter 5, Formats.

Installation

The document processors are supplied in an optional setup component. If youwish to use the processors, be sure to select the option to install the DocumentProcessors when you run the ContentMaster setup.

Additional processors, which are not documented in this chapter, may be available. Forinformation about processors that work with PostScript, Microsoft Project, and otherdocument types, please contact SAP support.

Defining Document Processors

When you use the New Parser wizard, you are prompted to define a preprocessor.The wizard assigns the document processor that you select to the pre_processorproperty of the example source. Unless you specify otherwise (for example, in thesources_to_extract property of the parser), the parser applies the preprocessor toall source documents.

In the IntelliScript, you can assign document processors in the pre_processorproperty of an input port. Input ports are used in locations such as theexample_source and sources_to_extract properties of parsers (see the Input PortComponent Reference in Chapter 3, Parsers).

4

Page 37: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 4. Document Processors

26

Display of Document Processor Output

If you assign a document processor to the example source, the example pane of theIntelliScript editor displays the processor output.

Document Processor Quick Reference

AFPToXMLConverts the IBM Advanced Function Presentation print-stream format toXML.

ExcelToHtmlConverts Microsoft Excel documents to HTML.

ExcelToTextMLConverts Microsoft Excel files to the TextML XML schema.

ExcelToTxtConverts Microsoft Excel documents to plain text.

ExcelToXmlConverts Microsoft Excel documents to XML.

ExpandFrameSetOpens an HTML frameset, letting a parser run on the content of the frames.

ExternalCOMPreProcessorRuns a custom document processor, implemented as a COM DLL.

ExternalJavaPreProcessorRuns a custom document processor, implemented in Java.

ExternalPreProcessorRun a custom document processor, implemented as a C++ DLL.

PdfToTxt_3_00Converts Adobe Acrobat (PDF) documents to plain text.

PowerpointToHtmlConverts Microsoft PowerPoint documents to HTML.

PowerpointToTextMLConverts Microsoft PowerPoint presentations to the TextML XML schema.

ProcessByTransformersRuns transformers as document processors.

ProcessorPipelineRuns a sequence of document processors on a single document.

RtfToTextMLConverts RTF files to the TextML XML schema.

Page 38: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 4. Document Processors

27

WordPerfectToTextMLConverts Corel WordPerfect documents to the TextML XML schema.

WordToHtmlConverts Microsoft Word documents to HTML.

WordToRtfConverts Microsoft Word documents to RTF.

WordToTextMLConverts Microsoft Word files to the TextML XML schema.

WordToTxtConverts Microsoft Word documents to plain text.

WordToXmlConverts Microsoft Word documents to XML.

XmlToExcelConverts XML documents to Microsoft Excel.

Document Processor Component Reference

This section describes the document processor components, which are available inContentMaster.

AFPToXML

This document processor converts the IBM Advanced Function Presentation print-stream format to XML. The output is in the UTF-8 encoding.

ExcelToHtml

This document processor converts Microsoft Excel documents to HTML.

The processor uses the Excel save-as-HTML feature to perform the conversion.Therefore, it operates only on a Microsoft Windows platform where Excel (version97 or higher) is installed. Due to Excel limitations, the processor does not supportmultithreading.

ExcelToTextML

This document processor converts Microsoft Excel files (version 97 or higher) to theTextML XML schema.

The processor requires a Java Runtime Environment (see the system requirementsin the ContentMaster Administrator's Guide). You do not need to install MicrosoftExcel on the computer.

Page 39: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 4. Document Processors

28

ExcelToTxt

This document processor converts Microsoft Excel documents to plain text.

The processor uses the Excel save-as-text feature to perform the conversion.Therefore, it operates only on a Microsoft Windows platform where Excel (version97 or higher) is installed. Due to Excel limitations, the processor does not supportmultithreading.

ExcelToXml

This document processor converts Microsoft Excel documents (version 97 orhigher) to XML in the UTF-8 encoding.

The processor requires a Java Runtime Environment version 1.4 or higher. It doesnot operate with a version 1.3 JRE (see the system requirements in theContentMaster Administrator's Guide). You do not need to install Microsoft Excel onthe computer.

ExpandFrameSet

This document processor opens all the frames of an HTML document.

This processor is appropriate if the source document of a parser is an HTMLframeset. The parser runs on the content of all the frames.

ExternalCOMPreProcessor

This component lets you run a custom document processor.

Because this component uses the Microsoft COM architecture to activate theprocessor, it runs only on Microsoft Windows platforms.

For other ways to implement a custom processor, see ExternalJavaPreProcessorand ExternalPreProcessor.

Despite the name of this component, you can use it to run either a preprocessor ora postprocessor.

Creating a Custom COM ProcessorYou should program the custom processor as an ActiveX DLL, containing thefollowing function:

function pre_process(ByVal in_file As String) As String

Here, in_file is the content of the source document. The function returns theprocessed text.

Register the DLL on the ContentMaster computer.

You may then use the DLL in the ExternalCOMPreProcessor component.

Page 40: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 4. Document Processors

29

Optionally, you can add the processor to the drop-down component list thatContentMaster displays (for instructions, see the chapter about Using theIntelliScript Editor in the book ContentMaster Studio in Eclipse).

Basic PropertiesProgID

The ProgID of the class.

ExternalJavaPreProcessor

This component lets you run a custom document processor, which is implementedin Java.

The processor requires a Java Runtime Environment (see the system requirementsin the ContentMaster Administrator's Guide).

For other ways to implement a custom processor, see ExternalCOMPreProcessorand ExternalPreProcessor.

Despite the name of this component, you can use it to run either a preprocessor ora postprocessor.

There is another way to implement a Java document processor, which lets the IntelliScriptpass custom properties to the processor. For information, see the chapter on ExternalComponents in the ContentMaster Engine Developer's Guide.

Creating a Custom Java Processor

To implement the custom document processor:

1. Create a new Java project and package, for example, namedMyJavaPreprocessor.

2. Create a class, for example, named JavaDemoPreprocessor.

3. In the class, define a method having the following syntax. The method canhave any name.

public static String main(String input_file, String output_file)

The input_file parameter is the path of the source document, on which theprocessor should operate. The output_file parameter is the path of atemporary file, where the processor should write its output.

The function should return an appropriate extension, which ContentMasterappends to the name of the temporary file. For example, if the output of theprocessor is XML, the function should return the string "xml".

4. Create a jar file containing the class.

5. Store the jar file in the externLibs\user subfolder of your ContentMasterinstallation folder.

You may then use the jar file in the ExternalJavaPreProcessor component.

Page 41: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 4. Document Processors

30

Optionally, you can add the processor to the drop-down component list thatContentMaster displays (for instructions, see the chapter about Using theIntelliScript Editor in the book ContentMaster Studio in Eclipse).

ExampleThe following sample is the source code of a processor that repairs numeric valuesby removing commas between the numbers. You can copy the sample code intoyour implementation and edit it as required.

package MyJavaPreprocessor;

import java.io.*;

public class JavaDemoPreprocessor {

private static final int MAX_SIZE = 4096;public static String main(String input_file, String output_file){

try {FileInputStream in = new FileInputStream(input_file);FileOutputStream out = new FileOutputStream(output_file);int bytes_read=0;while(bytes_read != -1){

byte [] in_buf = new byte[MAX_SIZE];byte [] out_buf= new byte[MAX_SIZE];bytes_read = in.read(in_buf);int j = 0;for (int i=1;i<bytes_read;i++){

if (in_buf[i] == ','){

if (Character.isDigit((char)in_buf[i-1]) &&Character.isDigit((char)in_buf[i+1]))

{// Do Nothing

}else

out_buf[j++] = in_buf[i];}else

out_buf[j++] = in_buf[i];}out.write(out_buf, 0, j);in.close();out.close();

}} catch (FileNotFoundException e) {

e.printStackTrace();} catch (IOException e) {}

Page 42: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 4. Document Processors

31

//return output file extension typereturn “txt”;

}}

Basic Propertiesjclass

The path of the Java class, for example,MyJavaPreprocessor/JavaDemoPreprocessor.

jmethodThe method to run, for example, main.

Online SampleFor an online sample of the Java code, which is similar to the above example, seethe following file in the ContentMaster installation folder:

Samples\SDK\ExternalPreprocessor\External_JavaPreprocessor.java

ExternalPreProcessor

This component lets you run a custom document processor, which is implementedas a C++ DLL.

For other ways to implement a custom processor, see ExternalCOMPreProcessorand ExternalJavaPreProcessor.

Despite the name of this component, you can use it to run either a preprocessor ora postprocessor.

There is another way to implement a C++ document processor, which lets the IntelliScriptpass custom properties to the processor. For information, see the chapter on ExternalComponents in the ContentMaster Engine Developer's Guide.

Creating a Custom Processor

To create a processor that is suitable to run with the ExternalPreProcessorcomponent, follow these steps.

The instructions are for the Microsoft Visual C++ compiler, running on a MicrosoftWindows platform. For compilation instructions on non-Windows platforms,please contact SAP support.

1. Copy the online-sample file External_Preprocessor.cpp. You can find the fileunder the ContentMaster installation folder, at the location:

Samples\SDK\ExternalPreprocessor\External_Preprocessor.cpp

2. Using the Visual C++ compiler, create a Win32 dynamic-link library project,and insert the C++ file into the project.

Page 43: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 4. Document Processors

32

3. The C++ file contains the following function, which implements thepreprocessing:

declspec(dllexport) bool process_buffer(istream& in, ostream& out)

In the sample implementation, the function repairs numeric values, removingcommas between the values. Replace the sample code with yourimplementation.

4. Compile the DLL.

5. Store the DLL in the externLibs\user subfolder of your ContentMasterinstallation folder.

You may then use the DLL in the ExternalPreProcessor component.

Optionally, you can add the processor to the drop-down component list thatContentMaster displays (for instructions, see the chapter about Using theIntelliScript Editor in the book ContentMaster Studio in Eclipse).

Basic Properties

import_dllBrowse to the custom processor DLL in the ExternLib\Users folder.

PdfToTxt_3_00

This document processor converts Adobe Acrobat (PDF) files to text in the UTF-8encoding.

The processor does not require any Adobe Acrobat software on the computer.

In the New Parser wizard, this processor is called PDF to Unicode (UTF-8).

If you open a project that was created in a previous ContentMaster version, you mayobserve that it uses the older PdfToTxt_2_02 processor, which generates ASCII output.PdfToTxt_2_02 is supplied for backwards compatibility only; in new projects, you shoulduse PdfToTxt_3_00.

PowerpointToHtml

This document processor converts Microsoft PowerPoint documents to HTML.

The processor uses the PowerPoint save-as-HTML feature to perform theconversion. Therefore, it operates only on a Microsoft Windows platform wherePowerPoint (version 97 or higher) is installed. Due to PowerPoint limitations, theprocessor does not support multithreading.

Page 44: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 4. Document Processors

33

PowerpointToTextML

This document processor converts Microsoft PowerPoint (*.ppt) presentations(version 97 or higher) to the TextML XML schema.

The processor requires a Java Runtime Environment (see the system requirementsin the ContentMaster Administrator's Guide). You do not need to install PowerPointon the computer.

ProcessByTransformers

This component lets you run transformers as document processors.

The component runs a transformer or a sequence of transformers on the entiredocument (instead of the normal transformer usage, which is to run on the textretrieved by an anchor). The parser then operates on the output of thetransformers.

ContentMaster offers a large number of transformers (see Chapter 8, Transformers).Hence, the ProcessByTransformers component greatly expands the set ofprocessing operations that you can apply to a document.

Basic Propertiestransformers

The sequence of transformers that the component should run.

ProcessorPipeline

This component lets you run a sequence of document processors on a document.The parser runs on the output of the sequence.

Within this component, enter the sequence of processors.

RtfToTextML

This document processor converts RTF files to the TextML XML schema. Theoutput is in the UTF-8 encoding.

WordPerfectToTextML

This document processor converts Corel WordPerfect documents to the TextMLXML schema. The output is in the UTF-8 encoding.

WordPerfect does not need to be installed on the computer.

Page 45: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 4. Document Processors

34

WordToHtml

This document processor converts Microsoft Word documents to HTML.

The processor uses the Word save-as-HTML feature to perform the conversion.Therefore, it operates only on a Microsoft Windows platform where Word (version97 or higher) is installed. Due to Word limitations, the processor does not supportmultithreading.

Online SampleFor a tutorial exercise illustrating how to use this processor, see the chapter ParsingWord and HTML Documents in Getting Started with ContentMaster.

WordToRtf

This document processor converts Microsoft Word documents to RTF.

The processor uses the Word save-as-RTF feature to perform the conversion.Therefore, it operates only on a Microsoft Windows platform where Word (version97 or higher) is installed. Due to Word limitations, the processor does not supportmultithreading.

WordToTextML

This document processor converts Microsoft Word files (version 97 or higher) tothe TextML XML schema.

The processor requires a Java Runtime Environment (see the system requirementsin the ContentMaster Administrator's Guide). You do not need to install MicrosoftWord on the computer.

WordToTxt

This document processor converts Microsoft Word documents to plain text.

The processor uses the Word save-as-text feature to perform the conversion.Therefore, it operates only on a Microsoft Windows platform where Word (version97 or higher) is installed. Due to Word limitations, the processor does not supportmultithreading.

The output is encoded according to the system code page.

WordToXml

This document processor converts Microsoft Word documents (version 97 orhigher) to XML in the UTF-8 encoding.

Page 46: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 4. Document Processors

35

The processor requires a Java Runtime Environment (see the system requirementsin the ContentMaster Administrator's Guide). You do not need to install MicrosoftWord on the computer.

In ContentMaster 3.2, this processor generated XML in the ISO-8859-1 encoding. If youupgrade a ContentMaster 3.2 project that uses the processor, you may need to edit theinput and working encodings (see Chapter 13, Project Properties).

XmlToExcel

This document processor converts XML documents to Microsoft Excel format.

The processor requires a Java Runtime Environment (see the system requirementsin the ContentMaster Administrator's Guide). You do not need to install MicrosoftExcel on the computer.

TextML XML Schema

Some of the document processors convert documents to an XML vocabulary calledTextML. This is a simple XML vocabulary for saving document content withoutlayout.

The following is a sample TextML file.

<?xml version="1.0" encoding="UTF-8"?><document>

<docinfo><title>TextML Sample</title><author>Tex Tomiller</author><company>Acme Gizmos, Inc.</company><modified>2004-03-14T14:39:00</modified><created>2004-03-12T09:15:00</created><last_author>Tex Tomiller</last_author><word_count>16</word_count><char_count>105</char_count><version>2</version>

</docinfo><docbody>

<p>This is a sample of the TextML XML vocabulary.</p><p>TextML saves document content without layout information.<p>

</docbody></document>

You can find the TextML schema (textML.xsd) in the doc subfolder of theContentMaster installation folder.

Page 47: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

36

Formats

When you define a parser, you should specify the format of the documents that theparser should process. For example, you can select TextFormat or HtmlFormat.

The following components are nested within the format:

Component Description

Delimiters A hierarchy of characters or strings that organize the information in thedocument, such as newlines and tabs.

Formatpreprocessor

A component that cleans up the source before the parser starts searching foranchors.

Defaulttransformers

A list of transformations that the parser applies to the output of each anchor.

This chapter describes the formats, delimiters, and format preprocessors that areavailable for your use. The subject of default transformers is discussed briefly here;for the details, see Chapter 8, Transformers.

Defining the Document Format

When you use the New Parser wizard, you are prompted to define the basicdocument format. This assigns the format with its default properties. You cancustomize the format by editing the Parser/format property in the IntelliScript.

The format is a component that has properties of its own, such as the delimiters,format preprocessor, and default transformers. By configuring the properties, youcan support an extraordinarily broad range of source documents.

5

Page 48: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

37

The supported format components are:

BinaryFormatCustomFormatHtmlFormatRtfFormatTextFormatXmlFormat

For the details of these formats, see the Format Component Reference below.

Standard Properties of Formats

The format components have a standard set of properties, which are explainedbelow.

Basic Properties

delimitersA hierarchy of characters or strings that organize the information in thedocument, such as newlines, spaces, tabs, commas, or vertical bars. You canalso use a regular expression (a wildcard pattern) to define the delimiters.

The delimiter concept is applicable both to rigidly structured documents,which use delimiter characters to separate the data fields, or to looselystructured text or HTML documents, which can use delimiters such asnewlines and syntactic markup.

The delimiter concept also encompasses positionally-structured data, wherethe fields are located at fixed offsets from one another.

The value of this property is a delimiters component, which contains apredefined list of delimiters. For example, the TabDelimited componentdefines the newline and tab characters as delimiters. Some delimiterscomponents, such as TabDelimited or DelimiterHierarchy, let you edit the listof delimiters. The Positional delimiters component defines an offset-basedstructure.

For a description of the delimiters components, see the Delimiters ComponentReference below.

Page 49: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

38

pre_processorAn optional format preprocessor, which converts the source to the format thatthe parser can process.

The format preprocessor acts on the source after any document processor thatyou may have defined (see Chapter 4, Document Processors). The purpose of theformat preprocessor is to clean up whitespace or markup, before the parserstarts to search for anchors.

By default, several of the predefined formats are configured with theHtmlProcessor format preprocessor, which collapses multiple whitespacecharacters to a single space.

For a detailed description of the available preprocessors, see the FormatPreprocessor Component Reference below.

default_transformersA list of transformers that the parser applies in sequence to the output of eachanchor. The purpose of the transformers is typically to clean up the output andremove markup codes.

For example, the RemoveTags transformer is typically one of the defaulttransformers of an HTML parser. The transformer removes HTML tags fromthe output of each anchor.

For a detailed description of the transformers, see Chapter 8, Transformers.

Advanced Properties

nameA name that you assign to the format. The name is used in the event log.

remarkA comment describing the format.

Format Component Reference

This section documents the format components that you can assign to the formatproperty of a Parser.

BinaryFormat

This format is suitable for parsing binary files. It is also suitable for text files, whichyou want to treat as a buffer of binary bytes.

PropertiesSee Standard Properties of Formats above.

By default, the delimiters property has a value of Positional . The pre_processorand default_transformers properties are empty.

Page 50: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

39

CustomFormat

This is a generic format, which you can use to process any type of sourcedocument. You must define the delimiters, default transformers, etc., yourself.

Example

A source document has the following typical structure:

Ron Lehrer && 547329876:27Evelyn Kern && 9875424: 53

Each line of the document is a record containing a person's name, ID number, andage. The fields are separated by the symbols && and :. The fields contain multiplespace characters at random locations.

One way to parse this document is by using a CustomFormat. In the delimitersproperty of the format, you might assign a DelimiterHierarchy containing thesymbols:

newline&&:

In the default_transformers property, you might assign theCMSUserGd08.doc#Ch7HtmlProcessor, which removes the extra spaces from theoutput.

PropertiesSee Standard Properties of Formats above.

By default, the delimiters, pre_processor, and default_transformers propertiesare empty. You should configure them yourself.

HtmlFormat

This format is suitable for parsing HTML files.

Page 51: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

40

This format is also recommended for processing Microsoft Office documents. Forthis purpose, you should assign a document processor such as WordToHtml orExcelToHtml, which converts the Office document to HTML.

Properties

See Standard Properties of Formats above.

By default, the delimiters property has a value of SGML, which recognizes theHTML delimiters such as < and >. The pre_processor is HtmlProcessor. Thedefault_transformers are:

RemoveTags: removes HTML tags from the output

HtmlEntitiesToASCII: converts HTML entities such as &lt; and &quot; to theirplain text equivalents (< and ", respectively)

HtmlProcessor: normalizes the whitespace, changing any sequence of tabs,newlines, and spaces to a single space character

RemoveMarginSpace: removes leading and trailing space

RtfFormat

This format is suitable for parsing RTF files.

PropertiesSee Standard Properties of Formats above.

By default, the delimiters property has a value of RTF, which recognizes thestandard RTF delimiter characters such as \. The pre_processor is RtfProcessor.The default_transformers are:

RtfToASCII: removes RTF control words from the output

RemoveRtfFormatting: converts HTML entity codes such as &lt; and &quot; totheir plain text equivalents (< and ", respectively).

HtmlProcessor: normalizes the whitespace, changing any sequence of tabs,newlines, and spaces to a single space character

RemoveMarginSpace: removes leading and trailing space

TextFormat

This format is suitable for parsing text files.

In combination with a document processor, this format is also suitable forprocessing other types of documents. For example, you can use the PdfToTxt_3_00,WordToTxt, or ExcelToTxt processor to process Adobe Acrobat, Microsoft Word, orMicrosoft Excel documents with this format.

Page 52: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

41

Properties

See Standard Properties of Formats above.

By default, the delimiters property has a value of DelimiterHierarchy, which letsyou define your own set of delimiters. The pre_processor is empty. Thedefault_transformers are:

HtmlProcessor: normalizes the whitespace, changing any sequence of tabs,newlines, and spaces to a single space character

RemoveMarginSpace: removes leading and trailing space

XmlFormat

This format is suitable for parsing XML files.

Parsing an XML file means converting an XML source document to an XML outputdocument. To do this, ContentMaster treats the source XML as ordinary text. Youcan define delimiters, anchors, etc. just as you do for any other kind of sourcedocument.

This is different from serialization, where an XML source document is converted tonon-XML output. In serialization, ContentMaster uses the XSD schema and theformal XML syntax rules to interpret the source document (see Chapter 10,Serializers).

PropertiesSee Standard Properties of Formats above.

By default, the delimiters property has a value of SGML, which recognizes the XMLdelimiters such as < and >. The pre_processor is HtmlProcessor . Thedefault_transformers are:

RemoveTags: removes XML tags from the output

HtmlEntitiesToASCII: converts XML entities such as &lt; and &gt; to theirplain text equivalents (< and >, respectively).

HtmlProcessor: normalizes the whitespace, changing any sequence of tabs,newlines, and spaces to a single space character

RemoveMarginSpace: removes leading and trailing space

Delimiters Component Reference

This section documents the delimiters components, which you can assign to thedelimiters property of a format.

ContentMaster uses the delimiters for purposes such as determining the searchcriteria of Content anchors. Specifically, this applies to Content anchors that areconfigured with the LearnByExample option (which is the default).

Page 53: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

42

For example, suppose you configure a format with the TabDelimited delimiterscomponent. This defines a hierarchy using the following characters as delimiters:

NewlineTab

You might define a Content anchor that is located two tab characters after thepreceding Marker anchor in the example source, like this:

MARKER<tab>abc<tab>CONTENT

When ContentMaster processes a source document, it searches for the Content twotabs after the Marker.

In a second example, you might define a Content anchor that is located threenewlines and one tab after a Marker anchor, in the example source.

MARKERabc<tab>defghi<tab>jkl<tab>mnoppqrst<tab>CONTENT

Within the intermediate lines, the tabs are not counted because the newlines arehigher in the hierarchy.

Editing the Delimiter Hierarchy

Many of the delimiters components, such as TabDelimited or CommaDelimiteddisplay a predefined hierarchy of delimiters, which you can edit as required.

The DelimiterHierarchy component does not have a predefined hierarchy. Youcan insert whatever delimiters you need.

Some delimiter components, such as SGML or PostScript, have a built-in hierarchy,which you cannot edit.

CommaDelimited

This delimiters component defines the following delimiter hierarchy:

NewlineComma

CommaDelimited is suitable, for example, if each line of a text file contains a record,and each record contains data fields separated by commas.

Page 54: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

43

You can add additional delimiters or edit the predefined hierarchy. The procedureis the same as for the DelimiterHierarchy component.

ExampleIn the example source document, suppose that a Content anchor follows a Markeranchor by two lines. In the third line, there are three commas (plus any other text)before the Content anchor, like this:

MARKERabcdef, ghijabc, def,ghi,CONTENT

If you assign the CommaDelimited component, the parser learns from the examplesource that the Content anchor always follows the Marker by two newlines andthree commas. In another source document, the parser will successfully find thefollowing Content anchor:

MARKERxyz, uvw, rst

,,,CONTENT

DelimiterHierarchy

This delimiters component lets you define a custom delimiter hierarchy.

Under DelimiterHierarchy, you can add any number of delimiters. For eachdelimiter, select either Delimiter (located between anchors) orEnclosingDelimiters (a pair of delimiters that surround an anchor).

Example

In the example source document, suppose that the anchors are separated bycommas and surrounded by brackets, like this:

MARKER,,[CONTENT]

You might define a DelimiterHierarchy that contains:

comma (defined as a Delimiter component)[] (defined as an EnclosingDelimiters component)

From this example, the parser learns that the Content anchor follows the Marker bytwo commas and is surrounded by brackets. In another source document, theparser will successfully find the following Content anchor:

MARKER,abc,def[CONTENT]

Online Sample

For an online sample, see Samples\Projects\EDI\EDI.cmw. The sample uses aDelimiterHierarchy to define the newline and * characters as delimiters, in an EDIsource document.

Page 55: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

44

HL7

This delimiters component defines the delimiter hierarchy that is used for parsingHL7 messages:

newlinevertical bar (|)caret (^) or tab

You can add additional delimiters or edit the predefined hierarchy. The procedureis the same as for the DelimiterHierarchy component.

The HL7 protocol permits a message to define its own delimiters. You can parsethe delimiter declaration of an HL7 message and create a dynamic delimiterdefinition. To do this:

1. Use Content anchors to retrieve the delimiter characters from the HL7 messageheader. Store the characters in variables.

2. Add Delimiter components under the HL7 component.

3. To each Delimiter component, assign TextSearch.

4. Under the TextSearch component, assign one of the variables to the textproperty.

Online SampleFor a tutorial exercise illustrating HL7 parsing, see the chapter Defining an HL7Parser in Getting Started with ContentMaster.

Positional

This delimiters component specifies that the parser should interpret the sourcedocument without using delimiters. Instead, it should locate each anchor bycounting the characters from the beginning of the search scope (see Chapter 7,Anchors, for an explanation of search scope).

Example

In the example source document, suppose that a Content anchor follows a Markeranchor by five characters (including spaces, tabs, etc.), like this:

MARKERab cdCONTENTefg

If you assign the Positional component, the parser learns from the example sourcethat the Content anchor always follows the Marker by five characters, and that it isseven characters long. In another source document, the parser will successfullyfind the following Content anchor:

MARKERd<tab>cbaCONTENTzy,xwv

Page 56: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

45

Online Sample

For a tutorial exercise illustrating the Positional format, see the chapter PositionalParsing of a PDF Document in Getting Started with ContentMaster.

Using Positional Parsing Together with Delimiters

You cannot add delimiters to the Positional component.

Sometimes, you may wish to define a parser that uses delimiters to locate someanchors, and uses a positional definition for other anchors. To do this, you shouldselect one of the other delimiters components (not Positional). To define thelocation of an anchor positionally, you can assign the OffsetSearch option in theanchor properties. For details, see Chapter 7, Anchors.

PostScript

This delimiters component defines a delimiter hierarchy that is used for parsingAdobe PostScript documents.

You cannot edit the delimiter hierarchy of the PostScript component.

RTF

This delimiters component defines a delimiter hierarchy that is used for parsingRTF documents.

You cannot edit the delimiter hierarchy of the RTF component.

SGML

This delimiter component defines a delimiter hierarchy that is used for parsingSGML documents. It is recommended for parsing HTML and XML, which arederivatives of SGML.

You cannot edit the delimiter hierarchy of the SGML component.

SpaceDelimited

This delimiters component defines the following delimiter hierarchy:

NewlineString of one or more space characters

SpaceDelimited is suitable, for example, if each line of a text file contains a record,and each record contains data fields separated by spaces.

You can add additional delimiters or edit the predefined hierarchy. The procedureis the same as for the DelimiterHierarchy component.

Page 57: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

46

Example

In the example source document, suppose that a Content anchor follows a Markeranchor by two lines. In the third line, there are two single-space characters and onestring containing multiple spaces before the Content anchor, like this:

MARKERabcdefabc def ghi CONTENT

If you assign the SpaceDelimited component, the parser learns from the examplesource that the Content anchor always follows the Marker by two lines and threestrings of spaces. In another source document, the parser will successfully find thefollowing Content anchor:

MARKERxyz

ghi def abc CONTENT

TabDelimited

This delimiters component defines the following delimiter hierarchy:

NewlineTab

TabDelimited is suitable, for example, if each line of a text file contains a record,and each record contains data fields separated by tabs.

You can add additional delimiters or edit the predefined hierarchy. The procedureis the same as for the DelimiterHierarchy component.

Example

In the example source document, suppose that a Content anchor follows a Markeranchor by two lines. In the third line, there are three tab characters (plus any othertext) before the Content anchor, like this:

MARKERabcdefabc<tab> de,f<tab>ghi<tab>CONTENT

If you assign the TabDelimited component, the parser learns from the examplesource that the Content anchor always follows the Marker by two lines and threetabs. In another source document, the parser will successfully find the followingContent anchor:

MARKERxyz

<tab><tab><tab>CONTENT

Online Sample

For a tutorial exercise illustrating tab-delimited parsing, see the chapter BasicParsing Techniques in Getting Started with ContentMaster.

Page 58: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

47

Delimiter Subcomponent Reference

This section documents subcomponents that are used within delimiterscomponents (see the Delimiters Component Reference).

Delimiter

This subcomponent defines a delimiter character or string, which separatesanchors. You can add Delimiter subcomponents within a delimiter hierarchy (see,for example, the DelimiterHierarchy component).

Example

The TabDelimited component contains two Delimiter subcomponents. The firstuses NewlineSearch to define the newline character as a delimiter. The second usesa TextSearch to define the tab character as a delimiter. (The tab is graphicallyrepresented as a « character.)

The SpaceDelimited component also contains two Delimiter subcomponents. Thefirst is identical to that of TabDelimited. The second uses a PatternSearch to defineany string of one or more spaces as a delimiter. (The regular expression [ ]+means "one or more space characters".)

Page 59: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

48

Basic Properties

SearchThe delimiter definition. The value is one of the following searchercomponents (see the Searcher Component Reference in Chapter 7, Anchors):

Subcomponent Description

NewlineSearch The delimiter is a newline

PatternSearch Uses a regular expression to define the delimiter

TextSearch An explicit character or string, or a string that youretrieve dynamically from the source document

EnclosingDelimiters

This subcomponent defines a pair of delimiter characters or strings, whichsurround anchors. You can add EnclosingDelimiters subcomponents within adelimiter hierarchy (see, for example, the DelimiterHierarchy component).

The component is useful, for example, to define the { } delimiters that surroundblocks of C program code.

Example

See the DelimiterHierarchy component.

Basic Propertiesopening

The opening delimiter.

closingThe closing delimiter.

escape_sequenceA prefix in the source document, such as a backslash character \, which causesthe parser to ignore an instance of the opening or closing delimiter.

Page 60: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 5. Formats

49

Format Preprocessor Component Reference

This section documents the format preprocessor components, which you canassign to the pre_processor property of a format (see the Format ComponentReference above).

Do not confuse format preprocessors with document processors (see Chapter 4,Document Processors). The differences are as follows:

You can assign a document processor to the pre_processor property of aninput port (under the example_source or sources_to_extract property of theparser). You can assign a format preprocessor only to the pre_processorproperty of a format.

ContentMaster runs a document processor on the source document before itperforms any other operations. The example pane of the IntelliScript editordisplays the output of the document processor.

ContentMaster runs a format preprocessor on the text that is displayed in theexample pane, before it searches for anchors. The output of the formatpreprocessor is not displayed.

HtmlProcessor

This format preprocessor (which is also available as a transformer, see Chapter 8,Transformers) normalizes whitespace according to HTML conventions. It convertsany sequence of tabs, line breaks, and space characters to a single space character.

You can use this preprocessor to normalize whitespace in any type of text. It is notrestricted to HTML documents.

RtfProcessor

This format preprocessor normalizes the code of RTF files. It is also available as atransformer (see Chapter 8, Transformers).

Page 61: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

50

Data Holders

A data holder is an object that has one of the following types:

An XML element

An XML attribute

A variable

Data holders of the first two types—XML elements and attributes—are used forpermanent storage. A parser, for example, stores its output in data holders of thesetypes.

Data holders of the third type—variables—are used for temporary storage. Forexample, a parser can store data that it extracts from a source document in avariable, for further processing before creating the output.

A common feature of all data holders is that they have XSD data types. In the caseof elements and attributes, the data holders are defined in an XSD schema, whichyou must supply. Variables are defined in an internal schema, which you maycustomize by adding user-defined variables.

This chapter explains how to create data holders and how to use data holders andXSD in ContentMaster.

XSD Schemas

When you create a parser or serializer, you must supply one or more XSD schemasthat define the structure of the XML. The schema defines the data holders (of theelement and attribute types) that the parser or serializer can use.

You must add the schema to your project. You can then map the content of adocument to elements and attributes that are defined in the schema.

About XSD

XSD is the commonly-used name for XML Schema, which is the industry-standardlanguage for XML schema definitions. XSD originally stood for XML schemadescription, but this term is not used in the official XML Schema standard. Theschema files typically have *.xsd filenames.

6

Page 62: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

51

The XSD standard is maintained by the World Wide Web Consortium(http://www.w3.org). Since the standard was first published in 2001, XSD has rapidlyreplaced earlier schema description languages, such as DTD and XDR.

The following is a simple example of an XSD schema:

<?xml version="1.0" encoding="Windows-1252"?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="Person"><xs:complexType>

<xs:sequence><xs:element name="Name" minOccurs="0">

<xs:complexType><xs:sequence>

<xs:element name="First" minOccurs="0" type="xs:string"/><xs:element name="Last" minOccurs="0" type="xs:string"/>

</xs:sequence></xs:complexType>

</xs:element><xs:element name="Id" minOccurs="0" type="xs:string"/><xs:element name="Age" minOccurs="0" type="xs:string"/>

</xs:sequence><xs:attribute name="gender" type="xs:string"/>

</xs:complexType></xs:element>

</xs:schema>

The schema defines the elements and attributes that can occur in an XMLdocument. The syntax lets a schema author specify the hierarchy and sequence ofelements, whether the elements are mandatory or required, their data types, theirpossible values, and many other features.

The above sample schema defines an XML structure such as the following:

<Person gender="M"><Name>

<First>Ron</First><Last>Lehrer</Last>

</Name><ID>547329876</ID><Age>27</Age>

</Person>

Page 63: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

52

If you trace through the schema, you can observe the correspondence betweendefinitions such as

<xs:element name="Person">

or

<xs:attribute name="gender" type="xs:string"/>

and the elements and attributes of the XML.

The elements and attributes have XSD data types, such as xs:complexType (forelements that contain nested elements) or xs:string (meaning that they containstring data). The elements have many other properties, such as their requiredsequence and the minimum number of times that must occur in an XML document(minOccurs).

Where To Learn XSD

For a brief explanation of the XSD syntax, see the chapter on Creating an XSDSchema in Getting Started with ContentMaster.

For comprehensive information, we recommend the following sources:

http://www.w3.orgThe web site of the World Wide Web Consortium, which created andmaintains the XML Schema standard.

http://www.w3schools.comSee this site for an excellent tutorial introduction to XSD.

How to Create XSD Schemas

You can create a schema in any XSD editor, for example, Altova's XML Spy(http://www.altova.com).

Typically, an XSD editor has a user-friendly interface, which lets you create andedit schemas even if you don't know the XSD syntax. Some editors also let youconvert an existing DTD or XDR schemas to XSD, or to create an XSD schema froma sample XML file.

If you know the XSD syntax, you can also edit XSD schemas in a text editor such asNotepad.

ContentMaster Studio uses the Xerces 2.2 XML parser to validate XSD schemas. In rarecases, due to implementation differences between XML parsers, a schema that passesvalidation in ContentMaster may fail in other tools, or vice versa.

Page 64: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

53

Encoding of the XSD Schema

You should save the schema in one of the supported input encodings (see Chapter13, Project Properties).

The schema encoding must be compatible with the work encoding, which you usefor text that you enter in the IntelliScript. This means that:

The schema encoding is identical to the work encoding,

or

Every character in the schema has an equivalent in the work encoding. Forexample, if the schema uses the UTF-8 encoding, and the work encoding isWindows-1252, the schema must not contain Unicode characters that have noWindows-1252 equivalent.

When you add a schema from an external location to a project, ContentMastertranslates the project copy of the schema to the work encoding.

Included XSD Files

An XSD file can reference additional XSD files. This feature lets you maintain alarge schema in a modular fashion.

Namespaces

If you plan to work with XML namespaces, you should assign the targetNamespaceattribute of the schema. In ContentMaster, you can edit the alias that is assigned tothe namespace (see Chapter 13, Project Properties).

Mixed Content

ContentMaster supports XML elements that have mixed content (both characterdata and nested elements). You may use the mixed attribute in a schema.

ContentMaster distinguishes between character data before and after each element(see Mapping Mixed Content below).

Unsupported XSD Features

The current ContentMaster version does not support certain uses of XSD features.The following table lists the limitations, as of the time this book was written.

We are working to remove the limitations. If you need any of the unsupported features,please check with SAP support for possibly updated information.

Page 65: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

54

XSD feature Description of limitation

redefine Redefining types and groups is not supported and should not be used.

uniquekeykeyref

Identity constraints may be used in a schema, but are ignored.

nilnillable

Nil elements may be used in a schema, but are ignored.

group (in place of anentity)

Model groups in place of XML entities, for example,<city>Montr<c:eacute/>al</city>, are not supported and shouldnot be used.Other uses of groups are supported.

longunsignedLong

Data holders having the XSD type long or unsignedLong currentlysupport integers with absolute values up to 2147483647. Larger valuesare not supported and may give incorrect results.

Adding XSD Schemas to a ContentMaster Project

You must add at least one XSD schema to every ContentMaster parser, serializer,or mapper project. The schema specifies what XML structure the project needs toprocess.

You can store the schema at any network location. ContentMaster copies theschema to the project folder.

If a schema includes other schema, you should add the main schema. When you dothis, ContentMaster adds the included schemas automatically.

Adding an Existing Schema

To add an XSD schema to a project:

1. In the ContentMaster Explorer view, right-click the XSD node of a project andchoose Add File.

2. Browse to the XSD file, which can be at any network location. If the XSD file isnot in your project folder, ContentMaster copies the file to the project folder.

3. The XSD folder of the ContentMaster Explorer displays the schema file. If theschema references any other XSD files, the Include folder of theContentMaster Explorer displays their names.

Page 66: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

55

ContentMaster Explorer view

4. Optionally, if the schema defines a target namespace, you can edit thenamespace alias in the ContentMaster project properties. For instructions, seeChapter 13, Project Properties.

Creating a New Schema

To create a new schema file in ContentMaster Studio:

1. In the ContentMaster Explorer, right-click the XSD node and choose New >XSD.

2. The ContentMaster Explorer displays the new file with a default name such asuntitled1.xsd. You should enter a meaningful name immediately when youcreate the file. To prevent errors in the references that a project creates to theschema, ContentMaster Studio does not allow you to change the name of anexisting schema file.

Editing a Schema

If you double-click an XSD file in the ContentMaster Explorer, ContentMasterStudio opens it for editing.

By default, ContentMaster Studio provides a simple, text-based schema editor.

Alternatively, you can configure ContentMaster Studio to open an editor of yourchoice, for example, XML Spy or Notepad. To do this, see the chapter onContentMaster Studio Preferences in the book ContentMaster Studio in Eclipse.

Reloading a Schema after Editing

If you edit or modify an existing schema, ContentMaster Studio may prompt youto reload the schemas. This ensures that the edited schema is available throughoutthe project.

You can reload the schemas at any time by right-clicking the XSD node in theContentMaster Explorer, and choosing Reload XSDs.

Page 67: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

56

Validating Data Holders

If you edit a schema that belongs to an existing project, data holders referenced inthe project may become invalid. For example, an anchor may reference an XPathexpression that no longer element that no longer exists in the schema.

To identify any problems of this sort, run the ContentMaster > Validate command.Any validation errors are displayed in the Tasks view.

Viewing a Schema

In the Schema view of ContentMaster Studio, you can view the elements andattributes of all schemas that belong to a project.

To help visualize a schema, you can generate a sample XML document thatillustrates the schema elements.

Schema View

To display the content of a schema, use the Schema view. The view displays aheading for each namespace that is defined in the project. Under the heading, itdisplays the elements and attributes that are defined in the schema.

The namespace display includes:

Three default entries for the Worldwide Web Consortium schema namespaces(beginning with http://www.w3.org). In most projects, you can ignore theseentries.

The ContentMaster variables namespace, which is used to define variablesthat you can use in your project (see Variables below).

An entry for each target namespace that is defined in the schemas that youhave added to the project.

If you add one or more schemas that do not define a target namespace, they aredisplayed under the no target namespace heading.

Page 68: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

57

Schema view

Displaying an XML Sample of a Schema

You can generate and display a sample XML file that illustrates a schema. To dothis, right-click the schema in the ContentMaster Explorer, and choose CreateExample XML.

The example illustrates features such as:

The data type of an element or attribute: "a" for string data, "1" for integerdata, "1.1" for floating point, etc.

The multiplicity of an element. If an element can occur more than once, theexample displays it more than once.

Page 69: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

58

Using a Schema to Map Anchors

When you define a parser, you must map Content anchors to output data holders.When you define a serializer, you must map input data holders toContentSerializer serialization anchors.

When you edit the data_holder property of an anchor, ContentMaster displays aSchema view. You should select the appropriate data holder.

Alternatively, when you configure a parser, you can drag text from the examplesource to a data holder in the Schema view. ContentMaster Studio creates aContent anchor, which maps the example text to the data holder.

IntelliScript Representation of Data Holders

In the IntelliScript, data holders are identified by a modified XPath expression,such as:

data_holder = /Person/*s/Name/*s/First

Do not try to type this value. If you wish to modify the mapping, select thedata_holder property and press Enter. This opens a Schema view, where you canselect the new mapping.

The ContentMaster XPath syntax is slightly different from the standard XPathsyntax, which is Person/Name/First. ContentMaster inserts *s, *c, and *a, whichrefer to the XSD terms sequence, choice, and all. The modifications resolveambiguities when ContentMaster uses XSD to construct XML output.

Mapping Mixed Content

If the schema supports mixed content, ContentMaster considers each element tohave before and after data holders. For example, consider the following mixedcontent:

Page 70: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

59

<Deal>We are pleased to offer you a price of<Price>34</Price>dollars. This is a special price for<Partner>

<Name>Acme Gizmos, Inc.</Name><ID>98765</ID>

</Partner>valid only until December 31.

</Deal>

ContentMaster considers this structure to contain data holders in the followinglocations:

Immediately after the <Deal> tag, before any of the sub-elements.

Before the Price element

The Price element

After the Price element

Before the Partner element

The Partner/Name and Partner/ID elements

After the Partner element

Immediately before the </Deal> tag, after all the sub-elements.

You might map the text "We are pleased to offer you a price of" to the dataholder before the Price element. You might map "dollars. " to the data holderafter Price, and "This is a special price for " to the data holder beforePartner.

The Schema view displays the mixed-content data holders.

Page 71: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

60

The IntelliScript displays the mixed content in a representation such as:

data_holder = /Deal/*s/Price/$text_before

Generating Valid XML

By default, ContentMaster generates XML that is valid according to the XSDschema that you have defined.

The ContentMaster approach to validation differs from the conventional approach,which is used in most XML applications. In the conventional approach, anapplication generates an XML document, and then checks that the output isconforms to a schema. The schema is applied after the generation, when the XMLdocument already exists.

In the ContentMaster approach, the schema is used as a guide while the XML isbeing generated. The schema is applied during the generation, and not afterwards.This approach helps ContentMaster data transformations to succeed, by ensuringthe validity of the output at the relevant locations as the transformation proceeds.

The following section, describing the Role of XSD in Parsing, illustrates theapproach.

Role of XSD in Parsing

This section explains some of the ways in which a parser uses XSD to ensure that itoutputs valid XML.

The discussion presents examples of the behavior. For an explanation of theparameters that control the behavior, see XML Generation in Chapter 13, ProjectProperties.

Sequence of Elements

When ContentMaster runs a parser, it organizes the output in the sequence that isrequired by the XSD schema.

For example, a schema may require that a LastName element precede a FirstNameelement. ContentMaster creates the output in the locations defined by the schema,even if the anchors that produce the output are defined in the opposite sequence.

Number of Occurrences

A parser may attempt to insert multiple instances of an element in the outputXML. ContentMaster uses the schema to determine whether the new instancesshould be appended or should overwrite the existing elements. The parser deletesany excess elements, beyond those that the schema permits, and writes warnings inthe event log.

In another example, suppose the schema defines an element without specifying aminOccurs or maxOccurs attribute. According to the XSD standard, the defaultminOccurs and maxOccurs values are 1, which means that the element must occur

Page 72: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

61

exactly once in the parser output. If the element is missing from the output, theparser can add it.

For more information, see Multiple-Occurrence Data Holders below.

Missing or Empty ElementsIn the project properties, you can configure whether a parser should insert emptyelements to comply with an XSD schema.

XSD Data Types

ContentMaster ensures that the text it stores in a data holder has the required XSDtype. For example, if a Content anchor retrieves the string "oranges 5 for adollar", and the XSD type of the data holder is xs:integer, the anchor stores onlythe integer 5 in the data holder.

For more information, see Using XSD Data Types to Narrow the Search Criteria inChapter 7, Anchors.

Role of XSD in Serialization

A serializer checks that its input is valid according to the XML schema. If the inputis invalid, the serializer attempts to correct the error and process the sourceanyway.

For example, if there are more occurrences of an element than the schema permits,the serializer may ignore the excess elements and process the valid ones. It writesappropriate warnings in the event log.

Variables

Variables are temporary data holders, which you can use in place of XML elementsor attributes. Variables are useful, for example, if you need to store a valuetemporarily during the operation of a parser, but you don't need to output thevalue in the XML.

For example, suppose you want a parser to read two content anchors andconcatenate their values. You might map each content anchor to a user-definedvariable. You can then use an action to concatenate the variables and output theresult to an XML element.

In addition to the user-defined variables, ContentMaster has several pre-definedsystem variables. The system variables are used, for example, to store informationthat is needed for certain actions.

Page 73: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

62

User-Defined Variables

To define a variable, use the ContentMaster > Insert > Variable command on themenu, or add a Variable component at the global level of the IntelliScript. Selectthe XSD data type that the variable can store, such as xs:string or xs:integer.

The variable is displayed under the Variables namespace in the Schema view.

System Variables

Several system variables are defined in every ContentMaster project. The followingparagraphs describe the variables and the ways in which they are used.

Variables Used to Access Source Documents

Several of the system variables store data that actions can use when they accesssource documents. For example, the RunParser action can use:

VarLinkURLA file path or a URL address.

VarPostDataA string containing form data that should be submitted.

The following variables are used in the SubmitForm and SubmitFormGet actions:

VarFormActionThe URL to which a form should be posted (the action attribute of the HTML<form> element.

Page 74: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

63

VarFormDataA string containing the form data that should be submitted. This is a multiple-occurrence variable (see Multiple-Occurrence Data Holders). Each occurrence is acomplete instance of the form data.

Read-Only Access VariablesThe following variables are read-only. A data transformation can use them torevisit a source document.

VarRequestedURLThe path or URL of the source document that a parser is processing.

VarCurrentURLThe path or URL of the current file that a parser is processing.

Usually, this is the same as VarRequestedURL. If the parser is configured withcertain preprocessors, VarCurrentURL may point to a temporary file rather thanthe original source document. VarRequestedURL always points to the sourcedocument.

VarCurrentPostThe form data that a parser submitted in order to retrieve the current page.

Read-Only System Time Variables

VarSystem is a read-only variable that returns system information. It is a structure(represented in the XPath format), which contains several nested variables:

VarSystem/ExecStartTime/YearVarSystem/ExecStartTime/MonthVarSystem/ExecStartTime/MonthNameVarSystem/ExecStartTime/DayVarSystem/ExecStartTime/DayNameVarSystem/ExecStartTime/HourVarSystem/ExecStartTime/MinuteVarSystem/ExecStartTime/SecondVarSystem/ExecStartTime/Millisecond

The nested variables store the year, month, day, etc., when the parser beganexecution. The variables are useful, for example, if you want a parser to insert atimestamp in its output.

Mapping Anchors to Variables

You can map a Content anchor to a variable, in the same way that you map to anyother data holder.

Do not map an anchor to a read-only system variable.

Page 75: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

64

Using Variables in Actions

Variables are often used as the input of actions. You can use a variable in the sameway as you use other data holders. For instructions, see Chapter 9, Actions.

Variable Component Reference

This section documents the Variable component, which you can add to a project.

Variable

A Variable component is a user-defined variable.

You can use variables for temporary storage, in the same locations that you use anXML element or attribute. For example, you can map a Content anchor to avariable, and you can use a variable as the input of an action.

Variables have XSD data types. They are displayed in the Schema view and in theIntelliScript. You can define a variable only at the top (global) level of theIntelliScript.

Basic Propertiesval_type

The data type that the variable can store. Select one of the standard XSD datatypes.

Advanced Propertieslist

Select this option to create a multiple-occurrence variable (see Multiple-Occurrence Data Holders).

Multiple-Occurrence Data Holders

In an XSD schema, you can use the maxOccurs attribute to set the maximumnumber of times that sibling elements can occur in an XML document. Likewise,you can define a variable that can occur either once or multiple times. An elementor variable that can occur only once is called a single-occurrence data holder. Anelement or variable that can occur more than once is called a multiple-occurrencedata holder.

Single- and multiple-occurrence data holders behave differently whenContentMaster stores data in them, for example, when you map Content anchors toa data holder.

In a single-occurrence data holder, each assignment overwrites the precedingassignment.

Page 76: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 6. Data Holders

65

In a multiple-occurrence data holder, each assignment generates a newoccurrence of the data holder.

To understand this, suppose that your XSD schema defines an XML element calledFirstName. If maxOccurs = 1, this is a single-occurrence data holder. If a parser mapsmore than one Content anchor to the FirstName element, the output contains onlythe final mapping.

For example, suppose the source document is a list of first names, each of which isa Content anchor that is mapped to FirstName:

Jack Jennie Larissa

The output contains only the final mapping:

<FirstName>Larissa</FirstName>

Now suppose that maxOccurs = unbounded. This is a multiple-occurrence dataholder. If you map multiple Content anchors to the element, the parser generates alist of names. The output is:

<FirstName>Jack</FirstName><FirstName>Jennie</FirstName><FirstName>Larissa</FirstName>

The same principle applies to variables. If you map multiple anchors to a multiple-occurrence variable, each anchor generates a new occurrence of the variable. This isuseful, for example, to prepare input for the AppendListItems and CombineValuesactions, which concatenate the occurrences.

Attributes

An XML attribute is always a single-occurrence data holder. An attribute cannot bemultiple-occurrence because XML does not permit the same attribute to appearmore than once in the same element.

An attribute can have an XSD list type, which is a space-separated list. The namesattribute in the following element is an example:

<Countries names=”USA Canada Mexico”/>

ContentMaster treats the attribute as a single-occurrence data holder with an XSDlist type (for an example of how to use this feature, see Using XSD Data Types toNarrow the Search Criteria in Chapter 7, Anchors).

Indexing

By default, ContentMaster accesses multiple-occurrence data holder sequentially.You can access a multiple-occurrence data holder non-sequentially by using theindexing feature. For information, see Chapter 12, Locators, Keys, and Indexing.

Page 77: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

66

Anchors

Anchors are the components that let a parser hook into specific locations in asource document, for the purpose of finding data and storing it in data holders. Ananchor is a signpost that you place in a document, indicating the position of thedata.

This chapter explains the different types of anchors and how you can use them inparsers.

Marker and Content Anchors

The most commonly used anchors are called Marker and Content anchors. Theseanchors are often used as a pair: a Marker anchor labels a location in a document,and a Content anchor retrieves text from the location.

To understand these anchors, imagine a printed questionnaire. The first linetypically asks for the person's last name and first name, with each label followedby a blank space to receive the information. In ContentMaster terminology, theprinted labels Last Name and First Name are Marker anchors, and the blank spacesare Content anchors. The anchors provide a means to home in on the data, for thepurpose of extracting it from the source document.

Other Anchor Types

In addition to marker and content anchors, ContentMaster provides many otheranchor types, which you can use to parse documents in different ways. Forexample, Group and RepeatingGroup anchors help you specify the organization ofthe data fields. An Alternatives anchor lets you specify multiple kinds of data thatmight occur at a particular location in a source document.

How Anchors and Delimiters Work Together

In ContentMaster Studio, you define the anchors in the example source document.The parser learns how to parse the document by examining the anchors and the

7

Page 78: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

67

delimiters that separate them. (For an explanation of delimiters, see Chapter 5,Formats).

For example, suppose you have specified that your document uses a tab-delimitedformat. A line in the example source reads

First name:<tab>Ron

where <tab> is a tab character.

You can define First name: as a Marker anchor. You can define Ron as a Contentanchor. The parser learns from these definitions that it should search a sourcedocument for the string First name:. It should then skip over a single tab delimiterand retrieve the text that follows the tab.

Suppose you run the parser on another source document, which contains thefollowing text:

First name:<tab>Jack

The parser finds the anchors as above and retrieves the text Jack.

Now suppose that the source document reads:

First name:<tab>Jack<tab>Age:<tab>34

The parser still retrieves the text Jack, rather than Jack<tab>Age<tab>34 . Thisworks because you have defined the tab character as a delimiter. ContentMasterunderstands that the Content anchor starts after the first tab and ends before thesecond tab. Of course, you might define some additional anchors that retrieveJack's age, which is 34.

The above examples describe the default behavior of the anchors and delimiters. The anchorshave many properties that let you override the default. For instance, you can define aContent anchor that skips over tabs, even in a tab-delimited format. Please see the AnchorComponent Reference for details.

Mapping Content Anchors to Data Holders

A Content anchor stores the text that it extracts from a source document in a dataholder. For example, you might configure a Content anchor to store its results in anXML element called FirstName. If the Content anchor retrieves the text Jack, theparser would produce the following output:

<FirstName>Jack</FirstName>

More precisely, you might specify that the anchor should store the retrieved text atthe path /Person/*s/FirstName, which refers to the XSD schema (see Chapter 6,Data Holders). The actual parser output would be:

<Person><FirstName>Jack</FirstName>

</Person>

Page 79: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

68

On the other hand, suppose that the XSD schema defines FirstName as an attributeof the Person element. You might map the Content anchor to /Person/@FirstName .The output would be:

<Person FirstName="Jack" />

You must map to a data holder that has an appropriate data type. For example, donot map Jack to an XML element that has an XSD integer data type, or to an XMLelement that has a complex data type containing nested elements. For an exceptionto this rule, see Using XSD Data Types to Narrow the Search Criteria, below.

You do not actually type the path (/Person/*s/FirstName, etc.) in ContentMasterStudio. When you edit a property whose value is a data holder, ContentMaster Studiodisplays a Schema view, where you can select the data holder. ContentMaster Studiodisplays the path for you in the IntelliScript.

Mapping to Variables

You can map an anchor to a data holder that is an XML element, an XML attribute,or a variable. The variable option is useful if you want to use the data in asubsequent processing step, but you do not want the raw data to be included in theparser output.

For example, suppose you want to extract several numbers from a sourcedocument and output their sum in the XML. You don't want the individualnumbers in the output. You can map the Content anchors that retrieve the numbersto variables, and use a CalculateValue action to compute and output the sum (seeChapter 9, Actions).

You might also map to a variable that you use in a subsequent anchor, for example,to define a dynamic search text for a Marker anchor.

Mapping to Multiple-Occurrence Data Holders

If you map Content anchors to a single-occurrence data holder, then eachassignment of the data holder overwrites the previous assignment.

If you map to a multiple-occurrence data holder, then each assignment generates anew occurrence of the element. For example, if each Content anchor retrieves aperson's name, the output is a list of names:

<FirstName>Jack</FirstName><FirstName>Jennie</FirstName><FirstName>Larissa</FirstName>

For more information, see Multiple-Occurrence Data Holders in Chapter 6, DataHolders.

Page 80: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

69

Mapping to Mixed-Content Elements

If an XML element can contain mixed content (both character data and nestedelements), the Schema view displays before and after data holders for the elements.This lets you map a Content anchor to character data that is located before or aftera particular nested element (see Using a Schema to Map Anchors in Chapter 6, DataHolders).

Defining Anchors

When you define a Parser component, you must add a sequence of anchors. Theparser operates by searching for the anchors in the source document and byrunning the operations that you have configured the anchors to perform.

Where to Define Anchors

In the IntelliScript, the anchors are nested within a Parser configuration.

Under the "contains" line of a Parser, you can nest a sequence of anchors.

If you press Enter at the indicated location, ContentMaster displays a drop-downlist, which includes the anchors (and other components) that you can add.

After you add the anchors, the IntelliScript displays the sequence of anchors. Inaddition, ContentMaster highlights the anchors in the example source.

Page 81: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

70

Some types of anchors can contain nested anchors. For example, you can nestanchors within an Alternatives or RepeatingGroup anchor. For details, see theAnchor Component Reference.

Sequence of Anchors

The sequence of the anchors should be the sequence of text in the sourcedocument.

For example, suppose that the source document is:

First Name: RonLast Name: Lehrer

Assuming that you define First Name and Last Name as Marker anchors, and thatyou define Ron and Lehrer as Content anchors, the required sequence of anchors inthe parser configuration is:

Anchor Text in the source document

Marker First Name

Content Ron

Marker Last Name

Content Lehrer

Exception: Variable Source SequenceSome source documents may have a variable sequence. For example, suppose thatthe source document may have either of the following structures:

First Name: RonLast Name: Lehrer

Page 82: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

71

or

Last Name: LehrerFirst Name: Ron

In such cases, you can use the marking property to change the search scope of theanchors (see How a Parser Searches for Anchors).

Select-and-Click Procedure for Marker and Content Anchors

You can add Marker and Content anchors by a select-and-click procedure. Forinstructions on how to do this, see Getting Started with ContentMaster .

In brief:

1. Select the anchor text in the example source file.

2. Right-click and choose the option to Insert Marker or Insert Content. Thisopens the New Element window for a Marker or Content anchor, respectively.

3. In the IntelliScript editor or in the IntelliScript Assistant view, set the anchorproperties. For a detailed explanation of the properties, see the AnchorComponent Reference below.

Drag-and-Drop Procedure for Content Anchors

You can define Content anchors by the following drag-and-drop procedure:

1. In the example pane, select the anchor text.

2. Drag the text to a data holder in the Schema view.

3. This creates a Content anchor, which is mapped to the data holder. Edit theIntelliScript and set the other anchor properties.

You can also drag and drop from the example pane to the IntelliScript pane. Forexample, you can drag to the text property of an anchor that is defined with theTextSearch option.

Page 83: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

72

Using the IntelliScript to Define Anchors

You can create any type of anchor by editing the IntelliScript. The procedure isidentical to editing any other component in the IntelliScript:

1. At the anchor location, press Enter.

2. Select or type the anchor name.

3. Press Enter again to confirm your selection.

4. Edit the anchor properties.

Standard Anchor Properties

In this section, we review certain properties that are found in many anchors. Foradditional properties that are specific to particular anchors, see the AnchorComponent Reference.

nameA name that you assign to the anchor. ContentMaster includes the name in theevent log. This can help you find an event that was caused by the particularanchor.

remarkA comment describing the anchor.

disabledIf selected, the parser ignores the anchor. This is useful for testing anddebugging, or for making minor modifications in a parser without deleting theexisting anchors.

Disabling an anchor disables all its nested components (nested anchors,transformers, etc.)

optionalBy default, if an anchor fails, the parent component (such as a Parser in whichthe anchor is nested) fails. If you select the optional property, the parentcomponent does not fail.

You can select the optional property to define an anchor that may or may notexist in a source document. If the anchor does not exist, the Parser continues.

If the anchor is nested within a Group anchor, the optional property keeps theGroup from failing. If the anchor is in a RepeatingGroup, the property keeps aniteration of the RepeatingGroup from failing.

directionThe direction in which ContentMaster searches for the anchor, within thesearch scope (see How a Parser Searches for Anchors below). If direction =forward, the parser finds the first instance of the anchor within the searchscope. If direction = backward, the parser finds the last instance.

Page 84: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

73

For example, suppose the search scope for a Marker anchor contains fiveinstances of the word ContentMaster. If direction = forward, the parser findsthe first instance of ContentMaster . If direction = backward, it finds the lastinstance.

For a Marker anchor, you can modify this behavior by using the countproperty. For example, if direction = backward and count = 2, the parserfinds the second to last instance.

markingSpecifies whether an anchor should be used as a reference point to find thesucceeding anchor. The options are full (places a reference point before andafter the current anchor), begin position (before only), end position (afteronly), and none (neither).

You can use this property to control the search scope for the succeedinganchor. For an explanation and examples, see How a Parser Searches forAnchors).

phaseThe processing phase during which ContentMaster should search for theanchor (initial, main, or final). For an explanation, see How a Parser Searchesfor Anchors.

no_initial_phaseThis property applies to components that have nested anchors. If the propertyis selected, the anchor has no initial phase. This overrides the option phase =initial in the immediately nested anchors, and changes it to main.

How a Parser Searches for Anchors

To design a parser correctly, it is important that you understand howContentMaster searches for the anchors in the parser configuration. There are threemain concepts:

Search phase

Search scope

Search criteria

This section explains the concepts, and how you can control each of them bysetting the anchor properties.

Page 85: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

74

Search Phases

ContentMaster searches for a sequence of anchors in three phases:

Initial

Main

Final

By default, all Marker anchors are in the initial phase and all Content anchors are inthe main phase. This means that ContentMaster first finds the Marker anchors, andit locates the Content anchors between them.

To understand this, consider a parser that processes the following sourcedocument:

First name: Ron Last name: Lehrer

Suppose you have defined the anchors in the following way, with default anchorproperties:

Anchor Text in the source document Phase

Marker First name: Initial

Content Ron Main

Marker Last name: Initial

Content Lehrer Main

In the initial phase, ContentMaster searches for the marker anchors:

It searches for the First name: marker.

It searches for the Last name: marker at a location that follows the Firstname: marker.

In the main phase, ContentMaster searches for the content anchors:

It searches for the Ron anchor at a location between the First name: and Lastname: anchors.

It searches for the Lehrer anchor at a location after the Last name: anchor.

Nested Phases

Anchors that have nested anchors, such as Group, have nested phases. For example,if a Group anchor runs in the parser's main phase, a Marker anchor that is nested inthe Group runs in a nested initial phase. The nested initial phase is part of theparser's main phase, but it is before the other anchors in the Group.

Page 86: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

75

Another example is a RepeatingGroup anchor, which searches for both separatorsand for nested anchors. In order to identify the nested anchors correctly, it searchesfor the separators before it searches for the nested anchors.

Search Scope and Search Criteria

The Search Phases example illustrates the concepts of search scope and search criteria.The search scope is the portion of a document in which ContentMaster searches foran anchor. The search criteria are the rules by which ContentMaster finds the anchorwithin the search scope.

In the initial phase, ContentMaster starts searching for the First name: marker atthe beginning of the document. The search scope for this anchor is the entiredocument.

There is a single search criterion: the anchor must contain the text First name:

The search scope for the Last name: anchor starts at the end of First name:, andextends to the end of the document. The search criterion is that the anchor mustcontains the text Last name:.

In the main phase, the parser interpolates the content anchors between the markeranchors. The search scope for the Ron content anchor extends from the end of theFirst name: anchor to the beginning of the Last name: anchor. Assuming that theparser uses a space-delimited format, the search criteria are to retrieve all the textin the search scope, after the leading space character and before the second spacecharacter.

The search scope for the Lehrer content anchor is from the end of Last Name: to theend of the document. The search criteria are similar to those for the Ron anchor.

Let's add this analysis to the anchor table. The table now describes the completemethod by which the parser finds the anchors.

Anchor Text in thesourcedocument

Phase Search scope Search criteria

Marker Firstname:

Initial Entire document Text = First name:

Content Ron Main End ofFirst name:to start of Lastname:

After the leading spaceBefore the next space

Marker Last name: Initial End ofFirst name:to end of document

Text = Last name:

Content Lehrer Main End ofLast name:to end of document

After the leading spaceBefore the next space

Page 87: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

76

Adjusting the Search Phase

By assigning the phase property of an anchor, you can change the phase in whichContentMaster searches for the anchor.

Consider the following source document:

CONTENT<10 characters>MARKER

In this example, the Marker anchor is located 10 characters after the Content anchor.

By default, ContentMaster searches for the Marker in the initial phase, and itsearches for the Content in the main phase. This won't work here, becauseContentMaster cannot find the Marker unless it has already found the Content!

The solution is to change the phase property of one of the anchors. You can changethe Content to the initial phase, or the Marker to the main phase. In either case,ContentMaster finds the anchors.

Adjusting the Search Scope

There are two ways to adjust the search scope for an anchor:

By setting the phase property of the anchor or the surrounding anchors

By setting the markingproperty of the surrounding anchors

Phase PropertyIf a Content anchor lies between two Marker anchors, then by default, the searchscope for the Content is the segment between the Marker anchors.

If you change all the anchors to the same phase, the search scope of the Content isno longer bounded by the second Marker ; it is from the end of the first marker tothe end of the document.

As an example, consider the following source document:

Tree Fig Date<tab>October 27, 2003 (pruned)Tree Date Palm Date April 27, 2003<tab>(planted)

The example assumes that the source document has a loose structure, containingvarying numbers of spaces, tabs, or other symbols interspersed in the text, so wecannot easily use the spaces and tabs as delimiters. An example like this mightarise in parsing word-processor documents.

We can parse this document using a RepeatingGroup anchor, which contains nestedMarker and Content anchors. The Marker anchors are the strings Tree and Date. TheContent anchors are everything between the markers, including the spaces andtabs.

The problem in parsing this document is in the second iteration of theRepeatingGroup, which parses the second line. If we leave the Marker anchors in theinitial phase, ContentMaster incorrectly considers the first instance of the wordDate to be a marker. In the main phase, it fails to find the content Date Palm

Page 88: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

77

because the search scope is between the two markers, and there is no text betweenthe markers.

A possible solution is to move the Date marker to the main phase, and to define theContent anchor (Date Palm) using an expression that searches for a tree name ofone or two words. In the initial phase of the RepeatingGroup , ContentMaster findsthe Tree marker. In the main phase, it finds the Date Palm content followed by theDate marker.

With the new phase setting, we have changed the search scope for the tree name.The scope is now from the Tree marker to the end of the iteration, andContentMaster finds the Date Palm successfully.

Marking Property

Consider the following source-document structure:

MARKER%%%CONTENT A^^^CONTENT B

Let's suppose that the sequence of Content A and Content B varies among thesource documents. In some documents, Content B precedes Content A.

In that case, the search criteria are:

Content A and Content B both follow the Marker anchor.

Content A begins with %%%, and Content B begins with ^^^.

By default, the search scope for Content A is from the end of the Marker to the endof the document. The search scope for Content B is from the end of Content A tothe end of the document. This won't work because in some source documents,Content A and Content B are reversed.

The solution is to change the search scope for Content B . You can do this by settingthe marking property of Content A. The marking property specifies whereContentMaster should place the reference points, which determine the start and endof the search scope.

The default setting is marking = full, which means that ContentMaster placesreference points before and after each anchor. The search scope for Content Bbegins at the last reference point, which is the one following Content A. This leadsto incorrect parsing, as we have seen.

You need to prevent ContentMaster from placing reference points aroundContent A. You can do that by setting the marking property of Content A to none.As a result, the search scope for Content B starts at the end of the Marker. Thisallows ContentMaster to find Content B, even if it precedes Content A.

The following table describes all four possible values of the marking property. TheResult column assumes that you assign the marking value to Content A in theabove example.

Page 89: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

78

markingproperty

Explanation Result

full ContentMaster places referencemarks at the beginning and endof the current anchor. This is thedefault behavior.

ContentMaster seeks the next anchor afterthe end of the current anchor (Content Bfollows Content A).

beginposition

ContentMaster places a referencemark only at the start of thecurrent anchor.

ContentMaster seeks the next anchor afterthe start of the current anchor (Content Boverlaps or follows Content A).

endposition

ContentMaster places a referencemark only at the end of thecurrent anchor.

ContentMaster seeks the next anchor afterthe end of the current anchor (Content Bfollows Content A).

none ContentMaster does not placeany reference marks at thecurrent anchor.

ContentMaster seeks the next anchor afterthe end of the preceding anchor (Content Bfollows Marker, without regard to ContentA).

There are a few circumstances where you must use an anchor that marks a reference point.An example is the separator of a RepeatingGroup. If the separator does not mark, it doesnothing. ContentMaster Studio displays a warning if you attempt to use a non-markinganchor in a location where marking is required.

Online Samples

In the ContentMaster Samples folder, openProjects\Marking_Mode\Marking_Mode.cmw. The sample demonstrates the use ofthe marking property to alter the search scope for a Content anchor.

For a second example, see Projects\NonMarker\NonMarker.cmw. This sample usesthe marking = none option, permitting two Content anchors to overlap. The samplealso illustrates the use of direction = backward to search from the end of thescope.

Adjusting the Search Criteria

ContentMaster can search for anchors according to a large number of searchcriteria, for example:

According to the delimiter locations, which ContentMaster learns from theexample source

According to the positional offset (number of characters) from a precedingreference point

By searching for particular text

Page 90: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

79

By searching for a pattern (a regular expression, which is an enhancedwildcard search)

By searching for a specified data type

By searching for an attribute value

You can combine these search criteria in almost any way. For example, you mightspecify that a Content anchor begins two tabs after a Marker anchor, and that it is10 characters long. If you do this, you are using a delimiter criterion to define thebeginning of the Content anchor, and an offset criterion to define the end.

The components that perform these searches are called searcher components. Theiruse is described in the Anchor Component Reference (especially in the Content andMarker anchors). The details of the searcher components are explained in theSearcher Component Reference.

Using XSD Data Types to Narrow the Search Criteria

In addition to the other search criteria, ContentMaster searches for a Contentanchor according to the XSD type of its data holder.

For example, suppose that the search scope of a Content anchor is the followingstring.

The students' grades were 81, 56, and 95, respectively.

Further suppose that you define no other search criteria for the anchor. If you mapthe anchor to a data holder that has a type of xs:string, the anchor retrieves theentire string.

If the data holder has a type of xs:integer, ContentMaster searches for the firstsubstring that matches the data type. Assuming that you configure the anchor withdirection = forward, the anchor retrieves the integer 81. If direction = backward(in other words, the search is from the end of the search scope), the anchorretrieves 95.

Now suppose the data holder has a type of xs:integer, and the schema restrictsthe data holder to values less than 60. ContentMaster searches for an integer thatconforms with the restriction, and returns 56.

XSD Types in Combination with Other Search CriteriaYou can combine an XSD-type criterion with other search criteria. In the aboveexample, suppose you configure the Content anchor to search for the regularexpression

[",.*,"]

which searches for two commas, separated by any characters other than a newline.The search finds the substring

, 56,

If the XSD type of the data holder is xs:integer, the anchor retrieves 56.

Page 91: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

80

List Types

A data holder can have an XSD list type, which is a space-separated list.ContentMaster filters the text retrieved by the Content anchor to match the XSDtypes of the list items.

Suppose, for example, that the schema defines an attribute called grades , which isa list of xs:integer items. If you map the above Content anchor to grades, theanchor returns a list of the integers in the string, or 81 56 95. If the grades attributebelongs to an element called Students, the XML output is:

<Students grades=”81 56 95" />

If you define the Content anchor with direction = backward, the list is reversed:

<Students grades=”95 56 81" />

Decimal TypeIf a data holder has the xs:decimal type, ContentMaster assumes that the decimalseparator is a period. If your locale setting uses a comma as the decimal separator,an xs:decimal search may fail.

Type Search with Closing Marker

If a Content anchor has a closing_marker property, but does not have anopening_marker, ContentMaster returns the substring that is closest to theclosing_marker, which matches the XSD type of the data holder.

In the above example, if you define the word respectively as the closing_marker,and the data holder has a type of xs:integer, the anchor retrieves 95.

Online Sample

In the ContentMaster Samples folder, open Projects\Pattern\Pattern.cmw.

The sample is a parser containing a single Content anchor. The anchor is mappedto an XML element, which the XSD schema restricts to a pattern (an xs:patternelement, which uses a regular expression to define the acceptable charactersequences). The anchor outputs the portion of the source document that matchesthe pattern.

Anchors that Contain Nested Anchors

An interesting question is how a parser searches for an anchor that has nestedanchors, such as a Group anchor.

ContentMaster does not search for a Group , and then search for the nested anchors.Rather, it searches (within the search scope) for the nested anchors. The extent ofthe Group is defined by the nested anchors that ContentMaster finds.

For example, suppose a parser has the following sequence of anchors. We assumethat the anchors have default phase, marking, and optional properties.

Page 92: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

81

Marker AGroup

Marker BContent CMarker D

Marker E

ContentMaster searches first for Marker A and Marker E. The search scope of theGroup is the region between Marker A and Marker E.

Within the search scope of the Group , ContentMaster then searches for Marker Band Marker D. The region between these markers is the search scope for Content C.

Within the latter search scope, ContentMaster searches for Content C.

You can view these relationships in the example pane of ContentMaster Studio.The example pane highlights the nested anchors, helping you visualize the extentof the Group.

Anchor Quick Reference

The following table briefly describes the anchors that ContentMaster supports. Forcomplex information, see the Anchor Component Reference below.

We have attempted to categorize the anchors according to their maincharacteristics or purpose. Within each category, the anchors are listed inalphabetical order.

Simple Anchors

The anchors in this category are used to define simple text elements in a document.

ContentRetrieves text from a specified location in a source document and stores thetext in a data holder

MarkerDefines a reference point in the source text, which the parser uses to search forother anchors.

Grouping AnchorsThese anchors group a set of nested anchors together.

DelimitedSectionsDefines sections of a document, which are delimited by a separator.

EnclosedGroupDefines a bounded segment of the source document.

GroupBinds a set of anchors together for processing as a unit.

Page 93: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

82

RepeatingGroupParses a repetitive section of a document.

Other Anchors

AlternativesSpecifies alternative anchors that may exist at a particular location in a sourcedocument.

EmbeddedParserActivates a secondary parser that runs on a segment of the source document.

FindReplaceAnchorMarks text for replacement (used with the TransformByParser transformer).

HtmlFormDefines an HTML form. The anchor submits the form to a web server and runsa secondary parser on the server response.

Anchor Component Reference

This section describes the anchor components that are available in ContentMaster.

Alternatives

The Alternatives anchor lets you define a set of alternative, nested anchors. Youcan define a criterion for which alternative the parser should accept. Only theaccepted anchor affects the parser output. The other anchors (whether failed orsuccessful) have no effect on the parser output.

Example

Suppose you are parsing a document in which the date can appear in either of thefollowing patterns:

21/10/03October 21, 2003

To process this content, you can define an Alternatives anchor that contains twoContent anchors, which store their output in different XML elements. Each XMLelement is constrained to accept one of the date patterns. The Alternatives anchoris configured with selector = ScriptOrder .

When the parser runs the Alternatives anchor, it tests the first Content anchor. Ifthe date matches the pattern of the first anchor, the Content anchor succeeds. If thedate does not match the pattern, the Content anchor fails, and the Alternativesanchor tests the second Content anchor. In this way, the parser can process bothdate patterns.

Page 94: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

83

How to Define

Add an Alternatives anchor by editing the IntelliScript. Nested with theAlternatives anchor, add the alternative anchors.

Basic Propertiesselector

The criterion for deciding which alternative to accept. The options are:

selector property Explanation

ScriptOrder ContentMaster tests the nested anchors in the sequence that theyare defined in the IntelliScript. It accepts the first nested anchorthat succeeds.If all the nested anchors fail, the Alternatives anchor fails.

DocumentOrder ContentMaster tests all the nested anchors. It accepts either thefirst or last successful nested anchor, according to thelocations of the anchors in the source document.If all the nested anchors fail, the Alternatives anchor fails.

NameSwitch ContentMaster searches for the nested anchor whose nameproperty is specified in a data holder (select from a Schema view).It ignores the other nested anchors.If the named nested anchor fails, the Alternatives anchor fails.

Advanced PropertiesFor explanations of the following properties, see Standard Anchor Properties.

name

remark

disabled

optional

marking

phase

Using Alternatives to Select a Secondary Parser

You can use an Alternatives anchor to control which of several secondary parsersprocesses a document. The main parser can use this feature to process sourcedocuments of multiple types.

For example, suppose that the home page of a newspaper web site has links tonews articles. Following each link, the article is labeled News, Business, or Sports.You want to parse the articles, using a different parser for each type, like this:

Page 95: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

84

<a href="PrincessWeds.html">Norwegian Princess Weds</a> - News<a href="BanksMerge.html">Local Banks to Merge</a> - Business<a href="HomeTeamWins.html">Bears Trounce Antelopes</a> - Sports

One way to do this is as follows:

1. The main parser retrieves the filename of an article and stores it in a variable.

2. The main parser contains an Alternatives anchor, which is configured withthe DocumentOrder option.

3. The Alternatives anchor contains nested Group anchors.

4. Each Group anchor is configured with a Marker anchor and a RunParser action,as follows:

- The first Group contains a Marker that searches for the string News. The Groupis configured with a RunParser action, which runs a secondary parser calledNewsParser.

- The second Group contains a Marker that searches for Business and runsBusinessParser.

- The third Group contains a Marker that searches for the Sports and runsSportsParser.

The Alternatives anchor tests all three Group anchors. It accepts the Groupcontaining the first Marker that occurs after the filename. The Group runs theappropriate parser on the file.

Online Sample

In the ContentMaster Samples folder, openProjects\Alternatives\Alternatives.cmw. The sample uses Alternatives anchorsto parse different name and date formats that may exist in a source document.

Content

A Content anchor retrieves text from the source document. It stores the retrievedtext in a data holder.

ExampleFor many examples and exercises using this anchor, see Getting Started withContentMaster.

How to Define

You can create a Content anchor by the select-and-click approach, by the drag-and-drop approach, or by editing the IntelliScript directly. For instructions, see DefiningAnchors above.

Page 96: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

85

Basic Properties

opening_markerA searcher component, which labels the start of a region, in whichContentMaster should search for the Content anchor.

Defining this property is similar to defining a Group, which contains a Markerfollowed by a Content anchor.

The possible property values are NewlineSearch, PatternSearch,OffsetSearch, and TextSearch. For example, a NewlineSearch means thatContentMaster should search for the anchor after a newline character. ATextSearch means to search after a specified text string. For details, see theSearcher Component Reference below.

closing_markerA searcher component, which labels the end of a region, in whichContentMaster should search for the Content anchor.

Defining this property is similar to defining a Group, which contains a Contentanchor followed by a Marker. Defining both opening_marker andclosing_marker is similar to defining a Group that contains a Marker ContentMarker sequence.

The property values are the same as for opening_marker.

valueSpecifies a searcher component, which searches for the text retrieved by theContent anchor (see the Searcher Component Reference below).

The search is between opening_marker and closing_marker. If opening_markeris not defined, the search is from the preceding anchor or reference point. Ifclosing_marker is not defined, the search is to the end of the document (or tothe end of a region defined in another way, such as the end of a Group). Formore information about the search algorithm, see How a Parser Searches forAnchors.

Page 97: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

86

The options are:

value Explanation

(empty) The Content anchor retrieves the entire search scope.

AttributeSearch The Content anchor retrieves the value from an expression ofthe type AttributeName = .... This is useful, for example, toretrieve attribute values from an XML or HTML source document.

LearnByExample The parser learns what text to retrieve according to the parserformat and the example source.For example, if the parser has a tab-delimited format, it counts thenumber of tabs from the start of the search scope to the exampletext. It retrieves the text between the corresponding tabs in thesource document.

PatternSearch The Content anchor retrieves the first text that matches aspecified regular expression.

TypeSearch The Content anchor retrieves the first text that matches aspecified XSD data type.

data_holderA data holder where the anchor should store the retrieved text (select from aSchema view).

In addition to the searcher components, ContentMaster uses the XSD type of thedata_holder as a search criterion. For information, see Using XSD Data Types toNarrow the Search Criteria above.

Advanced Propertiesallow_empty_values

If selected, the Content anchor can be empty. The data_holder is assigned anempty value.

This can occur, for example, if the anchor is configured with value =LearnByExample and there is nothing between the delimiters. It can also occurif there is nothing between the opening_marker and the closing_marker.

If allow_empty_values is not selected in these situations, the anchor fails.

ignore_default_transformersThis property applies to anchors that retrieve content from the sourcedocument. If selected, the anchor does not apply the parser's defaulttransformers to the content (see Chapter 8, Transformers).

Page 98: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

87

transformersThis property applies to anchors that retrieve content from the sourcedocument. The property is a sequence of transformers that ContentMastershould apply to the retrieved text (see Chapter 8, Transformers).

For explanations of the following properties, see Standard Anchor Properties.

name

remark

disabled

optional

direction

marking

phase

The direction property has multiple effects in a Content anchor. If direction =backward:

ContentMaster searches backwards from the end of the search scope for theopening_marker and closing_marker (opening_marker still precedesclosing_marker, however).

The searcher component searches backward from the end of the search scope.

If the searcher component is LearnByExample, it counts the delimitersbackward from the end of the search scope.

Online SampleIn the ContentMaster Samples folder, open Projects\Content\Content.cmw. Thesample illustrates several uses of the opening_marker, closing_marker, and valueproperties to configure Content anchors.

DelimitedSections

The DelimitedSections anchor is used for parsing sectioned data, which aredelimited by a separator.

Within the DelimitedSections, you should nest other anchors. Each nested anchoris responsible for parsing a single section.

ExampleAn employee resume form contains several sections, each of which is preceded bya line of hyphens:

----------------------------Jane PalmerEmployee ID 123456----------------------------

Page 99: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

88

Professional Experience...----------------------------Education...

You can define the sectioned region as a DelimitedSections anchor, with the lineof hyphens as the separator. Because the line of hyphens precedes each section,you should define the separator_position as before.

Within the DelimitedSections anchor, nest three Group anchors. The first Groupparses the Jane Palmer section, the second Group parses the ProfessionalExperience section, and so forth.

Optional Sections

In the above example, suppose that the second section, Professional Experience,is missing from some source documents. Its separator (the line of hyphens) isalways present.

----------------------------Jane PalmerEmployee ID 123456--------------------------------------------------------Education...

To handle this situation, you should configure the DelimitedSections in thefollowing way:

In the second Group anchor, select the optional property. This means that ifthe Group fails, it should not cause the DelimitedSections to fail.

In the DelimitedSections anchor, set using_placeholders = always. Thismeans that the anchor should look for the separator of the optional section,even if the section itself is missing.

Now suppose that if the Professional Experience section is missing, its separatoris also missing.

----------------------------Jane PalmerEmployee ID 123456----------------------------Education...

In this case, you should configure the DelimitedSections as follows:

In the second Group anchor, select the optional property.

In the DelimitedSections anchor, set using_placeholders = never. Thismeans that the anchor should not look for the separator of a missing section.

Page 100: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

89

How to Define

Add a DelimitedSections anchor by editing the IntelliScript. Nested with theDelimitedSections anchor, add a sequence of anchors that parse the sections.

Basic Propertiesseparator_position

Position of the separator relative to the sections.

The following table explains the values. The examples assume that theseparator is a vertical-line character ( | ).

separator_position Explanation Example

before There is a separator before each section (includingthe first section).

|1|2|3|4

after There is a separator after each section (including thelast section).

1|2|3|4|

between There is a separator between the successivesections (not before the first section and not after thelast section).

1|2|3|4

around There are separators before and after each section(including the first and last sections).

|1|2|3|4|

using_placeholdersThis property specifies whether the DelimitedSections should look for theseparator of an optional section that is missing from the source document.

The following table explains the values. The examples assume that the sourcedocument has the structure |1|2|3|4. The examples illustrate the structure ifsections 2 and 4 are missing.

using_placeholders Explanation Example

always The separator of a missing section always exists. |1||3|

never The separator of a missing section never exists. |1|3

when necessary The separator of a missing internal section alwaysexists. The separator of a missing terminal sectionnever exists.

|1||3

separatorAn anchor (typically a Marker) that delimits the sections.

Page 101: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

90

Advanced Properties

For explanations of the following properties, see Standard Anchor Properties.

name

remark

disabled

optional

marking

phase

Online Sample

In the ContentMaster Samples folder, openProjects\DelimitedSections\DelimitedSections.cmw. The sample illustrates aDelimitedSections anchor, which parses sections that are separated by a | symbol.Each section is parsed by a single Content anchor.

EmbeddedParser

The EmbeddedParser anchor uses a secondary parser to parse its search scope.

It is permitted for a parser to call itself recursively.

ExampleA document is tab-delimited, except for one section that is comma-delimited.

To parse the document, you can define a main parser that uses the TabDelimitedformat. Define another parser, which uses the CommaDelimited format. Use anEmbeddedParser anchor to run the second parser within the execution of the firstparser.

How to Define

You can define an EmbeddedParser by editing the IntelliScript.

Basic Propertiesparser

The name of the secondary parser, which must be defined in the same project.

schema_connectionsConnects the output of the secondary parser to the output of the main parser.The property contains a list of Connect subcomponents, which specify thecorrespondence between data holders in the output of the two parsers (see theAnchor Subcomponent Reference).

Page 102: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

91

Advanced Properties

source_transformersA sequence of transformers, which the parser should apply to the search scopebefore the secondary parser processes it.

For explanations of the following properties, see Standard Anchor Properties.

name

remark

disabled

optional

marking

phase

Online Sample

In the ContentMaster Samples folder, openProjects\EmbeddedParser\EmbeddedParser.cmw. The sample uses a main parser todetermine the location of an address. It then runs an EmbeddedParser, which parsesthe address.

EnclosedGroup

The EnclosedGroup anchor lets you define a bounded region that contains nestedanchors.

The boundaries are specified by two opening and closing anchors. In the case ofnested boundaries (such as parentheses or HTML tags), the EnclosedGroup findsthe matching boundaries.

An EnclosedGroup is similar to a Content anchor with an opening_marker andclosing_marker. However:

The Content anchor retrieves the entire content between the markers, withoutfurther parsing.

The EnclosedGroup lets you further parse the content between the markers.

Example

You can define an HTML table as an EnclosedGroup, with the <table> and</table> tags as the opening and closing markers. The nested anchors parse thecontent of the table.

Suppose the <table> element contains a nested <table> element (that is, the nestedtable is located within a cell of the parent table). The EnclosedGroup anchormatches the parent <table> tag with the parent </table> tag. It does not match theparent <table> tag with the nested </table> tag, which would be amisidentification of the table.

Page 103: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

92

For a tutorial exercise that uses this anchor, see the chapter Parsing Word and HTMLDocuments in Getting Started with ContentMaster.

How to Define

You can define an EnclosedGroup anchor by editing the IntelliScript. Add thenested anchors that parse the content.

Basic Properties

openingThe opening anchor of the EnclosedGroup (typically a Marker anchor).

closingThe closing anchor of the EnclosedGroup (typically a Marker anchor).

Advanced PropertiesThe following properties are useful in situations where the anchor must selectspecific occurrences of data holders. For an explanation, see Chapter 12, Locators,Keys, and Indexing.

source

target

For explanations of the following properties, see Standard Anchor Properties.

name

remark

disabled

optional

marking

phase

no_initial_phase

FindReplaceAnchor

This anchor is intended for use within a parser that is activated by theTransformByParser transformer. The anchor marks text in the source, and specifiesa replacement for the text. When the parsing is done, the TransformByParsertransformer uses the markings to modify the text.

FindReplaceAnchor identifies the text to replace in the following way:

If FindReplaceAnchor does not contain any nested anchors, it replaces thecomplete text within its search scope. For example, if FindReplaceAnchor isbetween two markers, it marks the text between the markers.

Page 104: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

93

If FindReplaceAnchor contains nested anchors, it replaces the text spanned bythe nested anchors. For example, if FindReplaceAnchor contain a marker, itreplaces the marker. If it contains two markers, it replaces the segment fromthe first marker to the second (including the markers themselves).

You can configure the anchor with a static replacement string, or with a string thatthe parser retrieved dynamically from the source document.

ExampleYou have a text document, to which you want to add line numbers.

1. Create a parser, and add a RepeatingGroup to it.

2. Within the RepeatingGroup , add a FindReplaceAnchor.

3. Within the FindReplaceAnchor , add a Marker anchor, and set its searchproperty to NewlineSearch . This causes the FindReplaceAnchor to mark everynewline in the document.

4. Configure the RepeatingGroup to store its current_iteration in a variable. Setthe replace_with property of the FindReplaceAnchor to the variable.

5. Add a global instance (not within the parser) of a TransformByParsertransformer. Set the parser property of TransformByParser to the parser.

6. Run the TransformByParser instance as a stand-alone transformer (see UsingTransformers as Runnable Components in Chapter 8 Transformers).

7. The transformer outputs a modified version of the original file, which containsline numbers. You can find the output in the Results folder of the project.

Basic Properties

replace_withType the replacement string. Alternatively, you may click the browse buttonand select a data holder that contains the text.

Advanced Propertieson_partial_match

If the FindReplaceAnchor does not find all its nested, non-optional anchors,and on_partial_match has a value of fail, the FindReplaceAnchor fails.

If on_partial_match has a value of skip, the FindReplaceAnchor removes thearea spanned by the successful nested anchors from its search scope and triesto find all the nested anchors again. It may succeed on the second try. Ititerates this procedure, as long as there is a partial match.

Page 105: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

94

The following properties are useful in situations where the anchor must selectspecific occurrences of data holders. For an explanation, see Chapter 12, Locators,Keys, and Indexing.

source

target

For explanations of the following properties, see Standard Anchor Properties.

name

remark

disabled

optional

marking

phase

no_initial_phase

Online Sample

For an online sample of this anchor, see TransformByParser in Chapter 8,Transformers.

Group

The Group anchor binds a sequence of anchors and actions together. It lets youapply properties to all the nested components, together.

For example, a Group lets you define operations that ContentMaster shouldperform on a set of anchors, or it lets you control the phase of the nested anchors.

How to Define

You can define a Group by editing the IntelliScript. Add nested anchors (andoptionally actions) that parse the content of the Group .

Optional Group

You can use the optional property of a Group to prevent ContentMaster fromattempting to retrieve text from a missing section of a document.

For example, to parse the source

First name: Ron

you might define First name: as a Marker and Ron as Content. If some sourcedocuments do not contain the first-name data, you can put the Marker and Contentin a Group and make it optional.

If First name: is not found, the Group immediately fails. The parser does notsearch for the Content anchor. This is the behavior we desire.

Page 106: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

95

There is a difference between making the Group optional and making its nestedanchors optional. If you make both the Marker and Content optional, instead of theGroup, ContentMaster ignores the Marker failure and searches for the Content. Thismight result in retrieving irrelevant text.

Advanced Propertiesabsent

If selected, the Group succeeds only if one of its nested, non-optional anchorsor actions fails. You can use this feature to test for the absence of nestedanchors.

on_partial_matchIf the Group does not find all its nested, non-optional anchors, andon_partial_match has a value of fail, the Group fails.

If on_partial_match has a value of skip, the Group removes the area spannedby the successful nested anchors from its search scope and tries to find all thenested anchors again. It may succeed on the second try. It iterates thisprocedure, as long as there is a partial match.

search_orderThe order in which to process the nested anchors. The options are:

search_order Explanation

top-down Processing the nested anchors in the orderthat is defined in the IntelliScript.

bottom-up Process the nested anchors in reverse order.This is useful if you need to retrieve data froma later anchor, which affects how you processan earlier anchor.

The following properties are useful in situations where the anchor must selectspecific occurrences of data holders. For an explanation, see Chapter 12, Locators,Keys, and Indexing.

source

target

Page 107: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

96

For explanations of the following properties, see Standard Anchor Properties.

name

remark

disabled

optional

marking

phase

no_initial_phase

Online SampleIn the ContentMaster Samples folder, openProjects\persistent_search\persistent_search.cmw.

The sample illustrates a Group that is configured with the on_partial_match =skip property. The Group contains two Marker anchors:

The first Marker searches for the text A.

The second Marker searches for a pattern (a string containing any number of *characters), and has the adjacent property, which means that it must beadjacent to the first Marker.

On the first pass, the Group finds an A character at the beginning of the sourcedocument. It does not find the second Marker adjacent to the A character, however.

The Group therefore reduces its search scope by eliminating the first A character,and searches again for the two adjacent markers. It continues this procedure until itsuccessfully finds a string A*, which contains the adjacent markers.

You can observe the behavior in the event log. The log records that the Group failson the first two trials, and succeeds on the third.

Try experimenting with the on_partial_match and adjacent settings. You can seethe effect in the color coding of the example source.

You can also try running the sample, although the result file is empty because theparser does not contain Content anchors. If you set on_partial_match = fail, youcan observe in the event log that the parser fails, because the Group cannot find theadjacent anchors.

HtmlForm

HtmlForm is an anchor that marks a <form> element in an HTML source document.It submits the form to a web server, which is specified in the action attribute of theform. It then activates a secondary parser, which parses the server response.

Within the field_filters property of HtmlForm, you can modify the fields andvalues of the form. The HtmlForm anchor collects the possible values of the form

Page 108: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

97

fields, combines them, and submits all the combinations. You can distribute thesubmissions over multiple computers.

HtmlForm appends the parsed output of the web-server responses to the mainparser output.

Configuration Tips

We suggest the following approach for configuring an HtmlForm anchor:

1. Prepare and run the anchor without any filters.

2. In the example pane, confirm that the HtmlForm anchor highlights the correctform.

3. Examine the Results\_HtmlForm.xml file, which contains the form data thatwas submitted. Confirm that the anchor included all the fields.

4. Now you can add filters or adjust the fields as desired.

How to Define

You can define an HtmlForm by editing the IntelliScript.

Basic Properties

next_parserThe name of a secondary parser, which parses the server response.

field_filtersAdds fields and their values to the form data, which the anchor submits.Specify a sequence of AddField , ModifyField , and RemoveFieldsubcomponents, which generate the desired fields (see the AnchorSubcomponent Reference).

Be sure to use the same field names as in the original HTML form.

clickSpecifies the HTML element that a simulated user clicks to submit the form.The options are ImageClick and SubmitClick (see the Anchor SubcomponentReference).

Advanced Properties

part_to_submitSelect the portion of the possible field-value combinations to submit. Theoptions are SubmitAll, SegmentIndex, and SegmentSize (see the AnchorSubcomponent Reference).

retriesThe number of retries, if the anchor cannot connect to the web server on thefirst attempt.

seconds_to_waitThe interval in seconds between retries.

Page 109: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

98

js_functionThe name of a JavaScript function that exists in the source document. Theanchor calls the function before it submits the form.

js_paramsA list of data holders containing parameters of js_function (select from theSchema view). The parameters must be in the same order as in the functiondeclaration.

For explanations of the following properties, see Standard Anchor Properties.

name

remark

Marker

A Marker defines a location in a source document. It is used as a reference point,from which ContentMaster searches for the succeeding anchors.

Example

For many examples and exercises using this anchor, see Getting Started withContentMaster.

How to DefineYou can define a Marker by the select-and-click method or by editing theIntelliScript. For instructions, see Defining Anchors above.

Basic Propertiessearch

Defines the search criteria for the Marker . The search criteria determine wherethe marker is located within the search scope (see How a Parser Searches forAnchors). For example, a NewlineSearch locates the Marker at a newlinecharacter. A TextSearch locates the Marker at a specified string.

The value of this property is one of the following searcher components (see theSearcher Component Reference).

search Explanation

NewlineSearch Searches for a newline character.

TextSearch Searches for a predefined text string, or a text string that isdynamically defined (contained in a data holder).

PatternSearch Searches for a string that matches a specified regular expression.

Page 110: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

99

search Explanation

OffsetSearch Skips a predefined number of characters following the precedingreference point, or a number of characters that is dynamicallydefined (contained in data holder).The Marker is the point following the skipped characters.

TypeSearch Searches for a string that conforms to a specified XSD data type.

Advanced Properties

adjacentIf selected, the Marker must be adjacent to the anchor at the beginning of itssearch scope (if direction = backward, adjacent to the anchor at the end of itssearch scope). If not selected (which is the default), ContentMaster can skipover text until if finds the Marker.

absentIf selected, the Marker is a test that the specified text or pattern is absent fromthe document. If ContentMaster finds the Marker, the parser or groupcontaining the Marker fails.

countThe occurrence number to find. The default is 1. To set the Marker at thesecond newline following the preceding anchor, for example, set search =NewlineSearch and count = 2.

For explanations of the following properties, see Standard Anchor Properties.

name

remark

disabled

optional

direction

marking

phase

By default, the phase property of a Marker is initial, which means thatContentMaster scans a document for Marker anchors before it searches for Contentanchors.

Online Sample

In the ContentMaster Samples folder, open Projects\Markers\Markers.cmw. Thesample demonstrates Marker anchors that search for:

A predefined text string

A newline character

Page 111: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

100

An offset

A data type

A regular expression

If you run the parser, note that the result file is empty because the configurationdoes not have any Content anchors.

RepeatingGroup

The RepeatingGroup anchor parses a repetitive region of a source document.

The repeating units (called iterations) are typically delimited by a separatormarker. The RepeatingGroup contains a sequence of nested anchors and actions,which parse each iteration.

The RepeatingGroup anchor treats all iterations in the same way. To parse a semi-repetitive region containing sections that require differing treatment, you can use aDelimitedSections anchor, instead.

Example

For detailed examples and exercises using this anchor, see Getting Started withContentMaster.

How to Define

You can define a RepeatingGroup by editing the IntelliScript. Add the nestedanchors (and optionally actions) that parse each iteration of the RepeatingGroup.

Search for Iterations

By default, a RepeatingGroup searches for iterations from the beginning to the endof its search scope (see How a Parser Searches for Anchors). Optionally, you can setthe iteration_order property for a reverse search.

In each iteration:

If the RepeatingGroup is configured with a separator, it searches for the nextseparator. Then, it searches for the anchors, which lie between a pair ofseparators.

If the RepeatingGroup is not configured with a separator, it searches only forthe anchors.

End of a RepeatingGroup

You can signal the end of a RepeatingGroup in ways such as the following:

The RepeatingGroup can continue until the end of the document.

Page 112: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

101

You can insert a Marker after the RepeatingGroup. You should configure theMarker in an earlier search phase than the RepeatingGroup (this is the default).This causes the parser to search for the Marker first, and use it to limit thesearch scope of the RepeatingGroup (see Adjusting the Search Phase).

You can set the count property, which limits the search to a maximum numberof iterations.

If the RepeatingGroup does not have a separator, it ends when the parsercannot find any more iterations.

Success or Failure of a RepeatingGroup

If a RepeatingGroup cannot find the non-optional anchors in an iteration, theiteration fails.

When an iteration fails, the RepeatingGroup can either end, fail, or skip the failediteration. The behavior is as follows:

If the RepeatingGroup does not have a separator, the RepeatingGroup ends.Provided that there was at least one successful iteration prior to the failediteration, the RepeatingGroup succeeds.

If the RepeatingGroup has a separator, and the skip_failed_iterationsproperty is not selected, the RepeatingGroup fails.

If the RepeatingGroup has a separator, and the skip_failed_iterationsproperty is selected, ContentMaster skips over the failed iteration andproceeds with the next iteration. Provided that at least one iteration succeeds,the RepeatingGroup succeeds.

Event Log of a RepeatingGroupThe ContentMaster event log records events for every iteration of aRepeatingGroup.

If the skip_failed_iterations property is selected, the RepeatingGroup maygenerate an optional failure event (symbolized by a icon, which follows thesuccessful iterations. A failure event ( icon) may be nested within the optionalfailure. These events occur because the RepeatingGroup cannot find additionaliterations to parse. The events are normal and not a cause for concern.

For example, if the source document contains two iterations, the log may displayan optional failure and a failure because the RepeatingGroup cannot find a thirdinstance of its separator.

Page 113: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

102

Event log of a RepeatingGroup that has two iterations

Basic Propertiesseparator_position

Position of the separator relative to the iterations of the RepeatingGroup.

The following table explains the options. The examples assume that theseparator is a vertical-line character ( | ).

separator_position Explanation Example

before There is a separator before each iteration (including thefirst iteration).

|1|2|3

after There is a separator after each iteration (including the lastiteration).

1|2|3|

between There is a separator between the successive iterations (notbefore the first iteration and not after the last iteration).

1|2|3

around There are separators before and after each iteration(including the first and last iterations).

|1|2|3|

separatorAn anchor (typically a Marker) that delimits the iterations.

If you leave this property empty, the RepeatingGroup does not look for adelimiter between the iterations. Instead, it assumes that an iteration isfinished when it has found all the nested anchors. It then starts to parse thenext iteration from the top of the nested anchor sequence.

You can build a complex separator by inserting a Group in the separatorproperty, instead of a Marker.

Advanced Properties

skip_failed_iterationsThis option has an effect only if the RepeatingGroup has a separator.

By default, this option is selected. This means that the RepeatingGroup skipsover a failed iteration and proceeds with the next iteration. Provided that atleast one iteration succeeds, the RepeatingGroup succeeds.

Page 114: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

103

If you deselect the option, the RepeatingGroup fails if any iteration fails.

search_orderThe order in which to process the nested anchors within each iteration. Theoptions are:

search-order Explanation

top-down Processing the nested anchors in the orderthat is defined in the IntelliScript.

bottom-up Process the nested anchors in reverse order.This is useful if you need to retrieve data froma later anchor, which affects how you processan earlier anchor.

iteration_orderThe order in which to process the iterations. The options are the same as forsearch_order, but apply to the iterations rather than to the anchors within aniteration.

countThe number of iterations to run. You may enter a number, or click the browsebutton and select a data holder that contains the number. If blank, theiterations continue until the search scope is exhausted.

If count = 0, the RepeatingGroup does not search for iterations. In this case, theRepeatingGroup succeeds, but it does not produce any output.

current_iterationA data holder, where the RepeatingGroup should output the number of thecurrent iteration (select from a Schema view).

on_partial_matchThis option controls the behavior if, in a particular iteration, theRepeatingGroup finds some but not all of its nested, non-optional anchors.

In such a case, if on_partial_match has a value of fail, the iteration fails.

If on_partial_match has a value of skip, the RepeatingGroup removes the areaspanned by the successful nested anchors from its search scope and tries tofind all the nested anchors again. It may succeed on the second try.

The removal-retry procedure is repeated until the iteration succeeds, or untilthere is no longer a partial match. In the latter case, the iteration fails.

The following properties are useful in situations where the anchor must selectspecific occurrences of data holders. For an explanation, see Chapter 12, Locators,Keys, and Indexing.

Page 115: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

104

source

target

For explanations of the following properties, see Standard Anchor Properties.

name

remark

disabled

optional

no_initial_phase

phase

marking

Online Samples

In the ContentMaster Samples folder, openProjects\Dynamic_And_RepeatingGroup\Dynamic_And_RepeatingGroup.cmw. Thesample uses a RepeatingGroup to iterate over the lines of a document.

Some lines of the source document contain a parenthesized footnote reference,such as "(1)". The RepeatingGroup contains a Group, whose purpose is to parse thefootnote and insert its content in the output.

The Group contains a Content anchor that retrieves the footnote reference andstores it in a variable. The Group then performs a RunParser action, which activatesa secondary parser. The secondary parser finds the footnote referenced by thevariable, parses it, and inserts the result in the output.

For additional samples, see the book Getting Started with ContentMaster. The bookcontains examples of:

Various configuration of the separator property

The use of a RepeatingGroup without an empty separator property

A nested iterative structure, which is parsed by a RepeatingGroup within aRepeatingGroup

Searcher Component Reference

This section documents the searcher components, which find text in a sourcedocument. The searcher components are used in many locations throughoutContentMaster, for example:

To define the location of anchors

To define delimiter characters or strings (see Chapter 5, Formats)

To define the find_what string of a Replace transformer (see Chapter 8,Transformers)

Page 116: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

105

AttributeSearch

This component searches a source document for a specified attribute, which occursin an expression of the type:

AttributeName = value

or

AttributeName = "value"

The component retrieves the value.

The component is a possible setting of the value property, which belongs to theContent anchor. For more information, see the Content anchor.

ExampleAn HTML document contains the element:

<img src='MyPicture.gif'>

You can use AttributeSearch to retrieve the value of the src attribute. It returnsthe text MyPicture.gif .

Supported Attribute SyntaxAttributeSearch supports attribute strings containing an equals sign. Optionally,the equals sign can be surrounded by spaces. The attribute can be surrounded bydouble quotes, single quotes, or no quotes.

For example, suppose that AttributeSearch is configured to search for an attributecalled time. In all of the following examples, it returns the same value, 12:55:33.

time = 12:55:33time=12:55:33time = '12:55:33'time='12:55:33'time = "12:55:33"time="12:55:33"

Basic Propertiesatt

The attribute name.

Advanced Properties

match_caseIf selected, AttributeSearch considers the attribute name to be case sensitive.

Online Sample

In the ContentMaster Samples folder, open Projects\Content\Content.cmw. Thesample illustrates the use of an AttributeSearch to parse a text document that hasa variable = value structure.

Page 117: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

106

LearnByExample

This component learns how to search for text by examining the text location in theexample source document. It uses the parser format to interpret the sourcedocument.

For example, if the parser has a tab-delimited format, LearnByExample counts thenumber of tabs from the search start to the example text. It searches for text in thesource document that lies at the same number of tabs from the start of the searchscope.

The component is a possible settings of the value property, which belongs to theContent anchor. For more information, see the Content anchor.

Note that if the Content anchor is configured with direction = backward,ContentMaster counts the delimiters from the end of the search scope.

Basic Properties

exampleThe text in the example source document at the anchor location.

NewlineSearch

This component searches for a newline (a linefeed character, a carriage returncharacter, or both).

Anchors can use NewlineSearch to find newline markers. A Delimiter componentcan use NewlineSearch to find newline delimiters.

OffsetSearch

This component defines the number of characters between a reference point (forexample, the end of a Marker anchor) and an anchor (for example, the beginning ofa Content anchor). The number of characters can be either predefined or (in somecomponents where OffsetSearch is used) retrieved dynamically from a dataholder.

For more information, see the Marker and Content anchors.

Basic Properties

offsetThe number of characters between the reference point and the anchor.

In some of the locations where OffsetSearch is used (such as in a Markeranchor), ContentMaster Studio displays a browse button next to the offsettext box. Clicking the button display a Schema view. In the Schema view, youcan select a data holder that contains the number of characters.

Page 118: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

107

Advanced Properties

allow_smaller_offsetIf the offset is beyond the search scope (for example, an offset defining thelength of a Content anchor, at the end of a document), allow a smaller offset.

PatternSearch

This component searches for a string that conforms to a regular expression.

Regular expressions are a way to define a text search criterion—something like awildcard search, but with greatly enhanced syntax. For more information about theuse of regular expressions in ContentMaster, and for detailed references, see theRegularExpression transformer in Chapter 8, Transformers.

Anchors can use PatternSearch to find markers or content. The Delimitercomponent can use PatternSearch to find delimiters. The Replace transformer canuse PatternSearch to find the text to be replaced.

Example

Suppose you want to define the string %%% (containing one or more % symbols) as adelimiter. Within the Delimiter component, you can use PatternSearch with thefollowing regular expression:

%+

In another example, suppose you want to define a comma and a semicolon asalternative delimiters, at the same level of the delimiter hierarchy. You can use theregular expression:

[,;]

Basic Properties

patternThe regular expression.

Advanced Propertiesescape_sequence

A prefix in the source document, such as a backslash character \, which causesthe parser to ignore an instance of the pattern.

SegmentSearch

This component searches for opening and closing markers in a text string. Itreturns the segment from the opening marker to the closing marker (including themarkers themselves).

The subcomponent is used in the Replace transformer to find text that is to bereplaced (see Chapter 8, Transformers).

Page 119: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

108

Basic Properties

OpeningThe search criterion for the opening marker. The options are searchercomponents (TextSearch, PatternSearch, NewlineSearch, or OffsetSearch).

ClosingThe search criterion for the closing marker.

TextSearch

This component searches for an explicit string.

In some locations where TextSearch is used, it can also search for a string that isdefined dynamically, such as a string that the parser retrieves from the sourcedocument.

Anchors can use TextSearch to find markers. The Delimiter component can useTextSearch to find delimiters. The Replace transformer can use TextSearch to findtext that is to be replaced.

Example

To define the string percent-percent-tab as a delimiter, create a Delimiter componentand set its search property to TextSearch. In the TextSearch/text property, type:

%%

Then press Ctrl+a, and type the ASCII code of a tab character:

009

Specifying a Search String Dynamically

In some of the locations where TextSearch is used (such as in a Delimitercomponent or a Marker anchor), ContentMaster Studio displays a browse button tothe right of the text box. Clicking the button display a Schema view. In the Schemaview, you can select a data holder that contains the search text.

Page 120: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

109

For example, suppose that you want to find repeated instances of the first word ina document. You can define a Content anchor that retrieves the first word andstores it in a variable. You can then define Marker anchors that use TextSearch tofind other instances of the variable's value.

Basic Propertiestext

The string to find. In locations where dynamic search is supported, you canspecify a data holder that contains the string (click the Browse button, andselect from a Schema view).

To type control characters, press Ctrl+a and enter their ASCII codes.

The IntelliScript displays a tab as «. It displays other special characters as anASCII code prefixed with a dot, for example, 010 for a newline.

Advanced Propertiesmatch_case

If selected, text is required to match the text property exactly, with the sameupper and lower-case letter.

escape_sequenceA prefix in the source document, such as a backslash character \, which causesthe parser to ignore an instance of the string. In locations where dynamicsearch is supported, you can specify a data holder that contains the escapesequence (click the Browse button, and select from a Schema view).

Online Sample

In the ContentMaster Samples folder, openProjects\Dynamic_And_RepeatingGroup\Dynamic_And_RepeatingGroup.cmw.

A Marker anchor, in the GetRemarkParser component of this sample, uses adynamically defined TextSearch to find a footnote at the end of the sourcedocument. For a full description of the sample, see the RepeatingGroup anchor.

TypeSearch

This component searches for an anchor of a specified XSD data type.

The component is a possible settings of the value property, which belongs to theContent anchor. For more information, see the Content anchor.

Basic Properties

val_typeThe XSD data type.

Page 121: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

110

Anchor Subcomponent Reference

This section describes subcomponents that you can assign as the values of certainanchor properties.

AddField

This is an option of the HtmlForm property field_filters. It adds a field to besubmitted with an HTML form.

Optionally, you can define multiple values. The HtmlForm anchor submits allpossible combinations of the values, with the values of other fields, to the webserver.

Basic Properties

field_nameName of the field.

filterThe way to generate the field values. The options are:

filter Explanation

ExcludeValues (Not applicable in the AddField component).

UseDataHolder Assigns a value that is contained in a data holder (selectfrom a Schema view). To assign multiple values, use amultiple-occurrence data holder.

UseValues Assigns one or more explicit values.

Connect

This component specifies a correspondence between two data holders. The twodata holders must have the same XSD data type.

Connect is used in the EmbeddedParser anchor to specify where a secondary parsershould store its output in the output of the main parser. It is used inEmbeddedSerializer to specify how the input data holders of a secondary serializerare related to the input data holders of the main serializer. It is used inEmbeddedMapper for a similar purpose, on both the input and output data holders.

Example

A secondary parser outputs an XML element called ID. You want the main parserto store this result in a variable called VarID. You can connect ID to VarID.

Page 122: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

111

For an additional example, see the EmbeddedSerializer component (Chapter 10,Serializers).

Basic Propertiesdata_holder

A data holder that is referenced in the main parser or serializer (select from aSchema view).

embedded_data_holderA data holder that is referenced in the secondary parser or serializer (selectfrom a Schema view).

ImageClick

This subcomponent submits a form by simulating a user who clicks an image inthe HTML form.

The subcomponent is a possible value of the click property in the HtmlFormanchor.

The pixel_x and pixel_y properties are useful if the image has an area map(hotspot links). The properties indicate the location in the image where the userclicked.

image_nameThe name attribute of the image, which is specified in the HTML code.

pixel_xThe x-coordinate where the user clicked (in pixels from the left edge).

pixel_yThe y-coordinate where the user clicked (in pixels from the top edge).

ModifyField

This is an option of the HtmlForm property field_filters. It modifies the value of afield that is defined in the HTML code of a form.

Optionally, you can define multiple values. The HtmlForm anchor submits allpossible combinations of the values, with the values of other fields, to the webserver.

Basic Propertiesfield_name

Name of the field.

filterThe way to generate the field values. The options are:

Page 123: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

112

filter Explanation

ExcludeValues Removes previously defined values.

UseDataHolder Assigns a value that is contained in a data holder (selectfrom a Schema view). To assign multiple values, use amultiple-occurrence data holder.

UseValues Assigns one or more explicit values.

RemoveField

This is an option of the HtmlForm property field_filters. It removes a field that isdefined in the HTML code of a form.

Basic Properties

field_nameName of the field.

SegmentIndex

This is an option of the HtmlForm property part_to_submit, which is used todistribute the form submissions between several computers.

SegmentIndex divides the set of field-value combinations into a specified numberof portions, and specifies which portion to submit. On another computer, you canconfigure a SegmentIndex that submits a different portion.

Basic Propertiesparts

The number of portions into which the combinations should be divided.

selected_partThe portion to submit (1 means the first portion, etc.)

SegmentSize

This is an option of the HtmlForm property part_to_submit, which is used todistribute the form submissions between several computers.

SegmentSize divides the set of field-value combinations into portions of a specifiedsize, and specifies which portion to submit. On another computer, you canconfigure a SegmentSize that submits a different portion.

Page 124: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 7. Anchors

113

Basic Properties

part_sizeThe number of combinations in each portion (by default, 2).

selected_partThe portion to submit (1 means the first portion, etc.).

SubmitAll

This subcomponent is an option of the HtmlForm property part_to_submit, which isused to distribute the form submissions between several computers.

This subcomponent submits all combinations of the field values from the samecomputer.

SubmitClick

This subcomponent submits a form by simulating a user who clicks a submitbutton.

The subcomponent is a possible value of the click property in the HtmlFormanchor.

Basic Properties

submit_nameThe name of the submit button.

Page 125: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

114

Transformers

Transformers are components that modify data.

You can use transformers within components such as anchors, serializationanchors, and actions. The transformers modify the output of the components. Forexample, if you use a transformer within a Content anchor, it modifies the data thatthe anchor extracts from the source document.

You can also use transformers as document processors or as stand-alone, runnablecomponents. In those cases, the transformers modify the complete content of adocument.

You can use the out-of-the-box transformers supplied with ContentMaster, or youcan define custom transformers.

This chapter explains how to use transformers and provides detailed informationon the transformers available in ContentMaster.

Defining Transformers

You can define transformers in the following locations of the IntelliScript:

In the transformers property of an anchor or a serialization anchor

In the default_transformers property of a format or of a serializer

In the ProcessByTransformers document processor

In the transformers property of certain actions

At the global level, as a stand-alone, runnable component that modifies asource document.

The following sections explain the use of transformers in each of these locations.

Using Transformers in Anchors

You can use transformers in an anchor that creates XML output, such as Content.In the IntelliScript, you should nest the transformer components within thetransformers property of the anchor.

The input of a transformer is the raw output of the anchor, before the anchorinserts the output in a data holder.

8

Page 126: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

115

For example, suppose you are parsing the following source document:

First name: RonLast name: Lehrer

You want to create XML output in ALL CAPS, like this:

<Person><FirstName>RON</FirstName><LastName>LEHRER</LastName>

</Person>

To do this, you can configure the Content anchors, which retrieve the strings Ronand Lehrer, with the ChangeCase transformer.

Sequences of Transformers

You can configure an anchor with a sequence of transformers. Each transformermodifies the output of the preceding transformer.

In the Ron Lehrer example, suppose you want the following output:

<Person><FirstName>- RON -</FirstName><LastName>- LEHRER -</LastName>

</Person>

To do this, you might configure the Content anchors with the ChangeCase andAddString transformer, which change the case and add the hyphens, respectively.

Default Transformers

Very often, you want the same transformers to run on all the Content anchors in aparser. You can configure the format component of the parser with defaulttransformers. This saves you the trouble of adding the same transformers to eachanchor in the parser.

To do this, you should nest the transformers in the default_transformers propertyof the format (see Chapter 5, Formats).

Many of the predefined format components include default transformers. Forexample, the HtmlFormat component has default transformers that remove HTMLtags from the output and convert HTML entities to plain text. If you wish tochange the default transformers, you can edit the default_transformers propertyof the format.

If an anchor has its own transformers, they run after the default transformers.

You can cancel the default transformers for particular anchors. To do this, set theignore_default_transformers property of the anchor.

Page 127: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

116

Using Transformers as Document Processors

You can run a transformer or a sequence of transformers as a document processor.

For example, you might run the RemoveTags transformer as a processor on anHTML document. The transformer removes the HTML tags from the document,before a parser starts to search for anchors in the document.

To do this, configure the parser format component with theProcessByTransformers document processor, and nest the transformers within thecomponent.

Using Transformers in Serialization Anchors

You can use transformers in serializers that write to the output document, such asContentSerializer. The transformers modify the data before the serializer writes itto the document.

For example, suppose a ContentSerializerwrites the content of a data holdercalled DoctorName to an output document. You might configure theContentSerializer with an AddString transformer, which adds the prefix "Dr. "to the content. Suppose the DoctorName data holder contains the following content:

Albert Schweitzer

The transformer modifies the content, resulting in the following output:

Dr. Albert Schweitzer

You can add transformers to the default_transformers property of a serializer.The transformers that you add here run in all the ContentSerializer serializationanchors before they write to the output document.

Using Transformers in Actions

Certain actions, such as SetValue and Map, let you apply transformers to theiroutput. For details, see Chapter 9, Actions.

Using Transformers as Runnable Components

You can define a transformer at the global (top) level of the IntelliScript. You canthen run the transformer as a stand-alone component, which modifies a sourcedocument. The transformer runs as the startup component, not within a parser or aserializer.

For instructions on defining global components, see the chapter on Using theIntelliScript Editor in the book ContentMaster Studio in Eclipse.

To run a globally-defined transformer in ContentMaster Studio:

1. Set the transformer as the startup component.

2. Choose the ContentMaster > Run command.

Page 128: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

117

3. You are prompted to select the source document, which the transformershould process.

4. ContentMaster Studio displays the Events view, where you can review theevents.

5. The output file is stored in the Results folder of the project. It has a filenamesuch as Transformation of <filename>.txt, where <filename> is the sourcefile. You can open the file in any suitable application.

Standard Transformer Properties

In this section, we review certain properties that are found in many transformers.For additional properties that are specific to particular transformers, see theTransformer Component Reference.

nameA name that you assign to the transformer. ContentMaster includes the namein the event log. This can help you find an event that was caused by theparticular transformer.

remarkA comment describing the transformer.

disabledIf selected, ContentMaster ignores the transformer. This is useful for testingand debugging, or for making minor modifications in a project withoutdeleting the existing transformers.

optionalBy default, this property of a transformer is selected. The property means thatif the transformer fails, its parent component (such as an anchor in which thetransformer is nested) does not fail.

If you deselect the optional property, and the transformer fails, it causes theparent component to fail.

Transformer Quick Reference

AbsURLConverts a relative path (such as a relative URL) to an absolute path.

AddEmptyTagsTransformerAn XML-to-XML transformer, which adds empty elements if elements aremissing from the XML.

AddStringAdds strings before and/or after the input text.

Page 129: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

118

BidiConvertReverses strings in languages that are written from right to left.

BigEndianUniToUniConverts big-endian Unicode to little-endian.

CDATADecodeDecodes a CDATA section of an XML document.

CDATAEncodeEncodes a CDATA section of an XML document.

ChangeCaseChanges the text to upper case or lower case.

CreateGuidGenerates a GUID identifier.

DateFormatFormats a date.

Dos96HebToAsciiConverts Hebrew text from the MS DOS code page to the Windows code page.

EbcdicToAsciiConverts EBCDIC to ASCII text.

EncodeAsUrlEncodes spaces and special characters, as required in a URL.

EncoderConverts text from one code page to another.

ExternalTransformerRuns a custom transformer that is implemented as a DLL.

FormatNumberFormats a number by adding a sign, decimal point, leading and trailing zeros,and a unit.

FromBase64TransformerConverts the base64 MIME encoding to a binary string.

FromFloatConverts a floating point number from binary to an ASCII stringrepresentation.

FromIntegerConverts an integer from binary to an ASCII string representation.

FromPackDecimalConverts a number from packed decimals to an ASCII string representation.

FromSignedDecimalConverts a number from signed decimals to an ASCII string representation.

Page 130: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

119

hebrewBidiReverses Hebrew text from RTL to LTR.

HebrewDosToWindowsTransformerConverts from the Hebrew MS-DOS to Windows code page.

HebrewEBCDICOldCodeToWindowsConverts Hebrew text from EBCDIC to the Windows-1255 code page.

hebUniToAsciiConverts Hebrew text from Unicode UTF-16 to the Windows-1255 code page.

hebUtf8ToAsciiConverts Hebrew text from UTF-8 to the Windows-1255 code page.

HtmlEntitiesToASCIIConverts HTML entities to plain text.

HtmlProcessorNormalizes whitespace in an HTML document.

InjectFPInserts a decimal point in a number.

InjectStringInserts a string into text.

JavaTransformerRuns a custom transformer that is implemented in Java.

LookupTransformerLooks up a value in a table.

NormalizeClosingTagsChanges <tag /> to <tag></tag> in XML input.

ODBCLookupReplaces the text with data retrieved from a database.

RegularExpressionModifies the text by using a regular expression.

RemoveMarginSpaceTrims leading and trailing space characters.

RemoveRtfFormattingRemoves all RTF formatting characters within the text.

RemoveTagsRemoves HTML tags.

ReplaceReplaces or deletes specified text.

ResizePads text to a specified size.

Page 131: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

120

ReverseTransformerReverses a string.

RtfProcessorNormalizes RTF code.

RtfToASCIIConverts RTF input to plain text.

SubStringReturns a substring of the input.

ToBase64TransformerConverts a binary string to the base64 MIME encoding.

ToFloatConverts a floating point number from an ASCII string representation tobinary.

ToIntegerConverts an integer from an ASCII string representation to binary.

ToPackDecimalConverts a number from an ASCII string representation to packed decimals.

ToSignedDecimalConverts a number from an ASCII string representation to signed decimals.

TransformByParserRuns a parser on the input text, replacing segments of the text.

TransformerPipelineApplies a sequence of transformers to the text.

WestEuroUniToAsciiConverts text in western-European languages from Unicode UTF-16 to theWindows-1252 code page.

XSLTTransformerApplies an XSLT transformation to XML input text.

Transformer Component Reference

This section documents the transformers that are available in ContentMaster.

AbsURL

This transformer converts a relative file path or URL to an absolute path.

For example, if the input is test.html and the base URL ishttp://www.example.com, the output is http://www.example.com/test.html.

Page 132: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

121

If the input is an absolute path, the transformer does not alter it.

Basic PropertiesBaseUrl

The base path or URL.

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

AddEmptyTagsTransformer

This is an XML to XML transformer. The transformer checks if all the elementsdefined in the XSD schema exist in the XML input. If not, it adds empty elementsto the XML.

Basic Propertiesroot_element

The root element of the XML (select from the Schema view).

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

AddString

This transformer adds strings before and/or after the input text.

Basic Properties

preThe string to add before the text.

postThe string to add after the text.

Page 133: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

122

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

Online SampleIn the ContentMaster Samples folder, openProjects\Transformers_Example\Transformers_Example.cmw. The first Contentanchor in the parser is configured with an AddString transformer.

BidiConvert

This transformer reverses strings that are written in right-to-left (RTL) languages,such as Hebrew and Arabic.

The input must be in the RTL format. The output is LTR.

Advanced PropertiesFor explanations of the following properties, see Standard Transformer Properties:

disabled

BigEndianUniToUni

This transformer converts big-endian Unicode to little-endian.

CDATADecode

This transformer decodes a CDATA section of an XML document. For example, itconverts

<![CDATA[100 < 200]]>

to

100 < 200

Of course, if you write the result to XML, ContentMaster re-encodes it using thestandard XML encoding:

100 &lt; 200

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

Page 134: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

123

disabled

optional

CDATAEncode

This transformer encodes a CDATA section of an XML document. For example, itconverts

100 < 200

to

<![CDATA[100 < 200]]>

Advanced PropertiesFor explanations of the following properties, see Standard Transformer Properties:

disabled

optional

ChangeCase

The ChangeCase transformer changes text to upper case, lower case, or first-letter-capitalized.

Basic Properties

case_typeThe output case (all_caps, all_lower, or first_cap).

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

Online Sample

In the ContentMaster Samples folder, openProjects\Transformers_Example\Transformers_Example.cmw. The third Contentanchor in the parser is configured with a ChangeCase transformer.

CreateGuid

This transformer generates a GUID identifier. The GUID is guaranteed to beunique on every generation.

The transformer ignores its input. The GUID is not related to the input in any way.

Page 135: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

124

DateFormat

This transformer formats a date.

ExampleSuppose you set:

input_format = "d/m/yy"output_format = "mm/dd/yyyy"

If the input is

13/3/05

the output is

03/13/2005

Supported FormatsThe following table lists the symbols that you can use in the date format:

Formatsymbol

Description Examples

d One- or two-digit day of month. 414

dd Two-digit day of month. 0414

ddd Three-letter abbreviation of day of week. Wed

dddd Full name of day of week. Wednesday

m One- or two-digit month. 212

mm Two-digit month. 0212

mmm Three-letter abbreviation of month. FebDec

mmmm Full name of month. FebruaryDecember

yy Two-digit year. 00-29 is interpreted as 20xx.30-99 is interpreted as 19xx.

0598

yyy Four digit year. BC dates are preceded by aminus sign.

20051998

Page 136: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

125

Formatsymbol

Description Examples

yyyy Four digit year, followed by AD or BC. 2005 AD1998 AD

\ Escape character. The following character iscopied to the output.

To\da\y is dd/mm/yyyygenerates output such asToday is 15/03/2005

All othercharacters

Copied to the output Now is dd/mm/yygenerates output such asNow is 15/03/2005

Basic Propertiesinput_format

The format of the input date, for example, d/m/yy. You can type the format, orbrowse to a data holder that contains the format. If you omit the format, thesystem default is assumed.

output_formatThe format of the output date, for example, mm/dd/yyyy. You can type theformat, or browse to a data holder that contains the format.

Advanced PropertiesFor explanations of the following properties, see Standard Transformer Properties:

disabled

optional

Dos96HebToAscii

This transformer converts Hebrew text from the MS DOS code page to theWindows-1255 code page.

EbcdicToAscii

This transformer converts EBCDIC to ASCII text.

EncodeAsUrl

This transformer encodes spaces and special characters, as required in a URL. Thecharacters are encoded as hexadecimal, preceded by a % symbol.

For example, the transformer converts

Page 137: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

126

http://www.example.com?name=Ron Lehrer

to

http://www.example.com?name=Ron%20Lehrer

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

Online Sample

In the ContentMaster Samples folder, openProjects\Transformers_Example\Transformers_Example.cmw. The fourth Contentanchor in the parser is configured with an EncodeAsUrl transformer.

Encoder

This transformer converts text from one code page to another.

Basic Properties

input_code_pageThe input code page (select from the list).

output_code_pageThe output code page (select from the list).

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

ExternalTransformer

This component lets you run a custom transformer, which is implemented as aC++ DLL.

To implement a custom transformer in Java, see the JavaTransformer component.

Page 138: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

127

There is another way to implement a C++ transformer, which lets the IntelliScript passcustom properties to the transformer. For information, see the chapter on ExternalComponents in the ContentMaster Engine Developer's Guide.

Creating a Custom TransformerTo create a DLL that is suitable to run with the ExternalTransformer component,follow these steps:

The instructions are for the Microsoft Visual C++ compiler, running on a MicrosoftWindows platform. For compilation instructions on non-Windows platforms,please contact SAP support.

1. Copy the online-sample file ExternalTransformerExample.c. You can find thefile under the ContentMaster installation folder, at the location:

Samples\SDK\ExternalTransformer\ExternalTransformerExample.c

2. Using the Visual C++ compiler, create a Win32 dynamic-link library project,and insert the C file into the project.

3. The C file contains the following function, which implements the transformerprocessing:

__declspec(dllexport) int transform(const char* in, char** out)

In the sample implementation, the function reverses the text. Replace thesample code with your implementation.

4. Implement the buffer-release function, which is also in the sample code:

__declspec(dllexport) void release_buf(char* buf)

5. Compile the DLL.

6. Store the DLL in the externLibs\user subfolder of your ContentMasterinstallation folder.

You may then use the DLL in the ExternalTransformer component.

Optionally, you can add the transformer to the drop-down component list thatContentMaster displays (for instructions, see the chapter about Using theIntelliScript Editor in the book ContentMaster Studio in Eclipse).

Basic Properties

import_dllBrowse to the custom transformer DLL in the ExternLib\Users folder.

Page 139: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

128

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

FormatNumber

This transformer formats a number by adding a sign, decimal point, leading ortrailing zeros, and unit.

Basic Properties

signAdds a plus or minus sign at the beginning or end of the number. The optionsare un_signed (deletes a sign if present), leading_sign, trailing_sign,negative sign only, and as in source (does not change the input sign).

insert_decimal_pointSets the decimal point symbol. The options are none, point, and comma.

unit_typeAdds a unit after the number. Select a unit such as meter, cm, mm, or inch. If youdo not want to add a unit, select undefined.

size_of_integer_partPads the integer part with leading zeros to the indicated size.

number_of_decimalsPads the decimal part with trailing zeros to the indicated size.

Advanced PropertiesFor explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

FromBase64Transformer

This transformer converts the base64 MIME encoding to a binary string.

The transformer is supported only on the Microsoft Windows platform.

Page 140: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

129

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

FromFloat

This transformer converts a floating point number from binary to an ASCII stringrepresentation.

Advanced Properties

sizeSize of the number: single_precision_32_bit or double_precision_64_bit .

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

FromInteger

This transformer converts an integer from binary to an ASCII string representation,in decimal, octal, or hexadecimal.

Basic Properties

sizeSize in bytes of the binary representation. The supported values are 1 to 8.

Advanced Propertiessigned

If selected, the transformer adds a sign to the number.

to_baseThe base of the output: decimal, octal, hexadecimal, lowercase hexadecimal.

Page 141: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

130

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

FromPackDecimal

This transformer converts a number from packed decimals to an ASCII stringrepresentation.

Advanced PropertiesFor explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

FromSignedDecimal

This transformer converts a number from signed decimals to an ASCII stringrepresentation.

Advanced Properties

insert_sign_symbolAdds a sign symbol (plus or minus) before or after the number. The optionsare no, before, and after.

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

hebrewBidi

This transformer reverses Hebrew text from RTL to LTR. It is similar toBidiConvert, but it uses a different algorithm for the reversal, which may produceslightly different results.

Page 142: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

131

HebrewDosToWindowsTransformer

This transformer converts Hebrew documents from the MS DOS Hebrew codepage to the Windows Hebrew code page.

HebrewEBCDICOldCodeToWindows

This transformer converts Hebrew text from EBCDIC to the Windows-1255 codepage.

hebUniToAscii

This transformer converts Hebrew text from Unicode UTF-16 to the Windows-1255code page.

hebUtf8ToAscii

This transformer converts Hebrew text from Unicode UTF-8 to the Windows-1255code page.

HtmlEntitiesToASCII

This transformer converts HTML entities to plain text. For example, it converts&copy; or &#169; to a copyright symbol (© ).

Supported EntitiesThe transformer supports the ISO 8859-1 (Latin-1) entities that are defined in theHTML 4.0 reference (http://www.w3.org/TR/1998/REC-html40-19980424/sgml/entities.html). The supported entities include:

&amp;, &lt;, &gt;, and &quot; (& < > ", respectively)

Numeric character codes &#0; to &#255;

Entities for Latin-1 characters (&nbsp; = non-breaking space, &copy; =copyright, etc.)

The transformer does not support extended characters (codes > 255 or non-Latin-1characters).

Output Encoding for Upper-ASCII CharactersIf the transformer output contains upper-ASCII characters, be sure to select anoutput encoding that supports the characters, such as Windows-1252 or UTF-8.

If the output is XML, we recommend that you include an encoding attribute in theXML processing instruction. Otherwise, ContentMaster Studio may be unable todisplay the characters.

To set the encoding, see Encoding Page in Chapter 13, Project Properties.

Page 143: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

132

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

HtmlProcessor

This transformer (which is also available as a format preprocessor, see Chapter 5,Formats) normalizes whitespace according to HTML conventions. It converts anysequence of tabs, line breaks, and space characters to a single space character.

You can use this transformer to normalize whitespace in any type of text. It is notrestricted to HTML text.

InjectFP

This transformer inserts a decimal point at a specified location in a number. Forexample, the transformer can convert 12345 to 123.45.

Basic Properties

digits_after_decimal_pointThe number of digits after the decimal point.

Advanced PropertiesFor explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

InjectString

The InjectString transformer inserts a string into text.

Basic Properties

injection_placeThe location in the text to insert the string (0 to insert the string before thetext).

string_to_injectThe string to insert.

Page 144: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

133

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

JavaTransformer

This component lets you run a custom transformer, which is implemented in Java.

To implement a custom transformer in C++, see the ExternalTransformercomponent.

There is another way to implement a Java transformer, which lets the IntelliScript passcustom properties to the transformer. For information, see the chapter on ExternalComponents in the ContentMaster Engine Developer's Guide.

Creating a Custom TransformerTo implement a custom transformer in Java, follow these steps:

1. Create a new Java project and package, for example, named MyTransformer

2. Create a class that implements a static method with the following syntax. Themethod can have any name.

public static byte[] Transform(byte[] in)

3. Create a jar file containing the class.

4. Store the jar file in the externLibs\user subfolder of your ContentMasterinstallation folder.

You may then use the jar file in the JavaTransformer component.

Optionally, you can add the transformer to the drop-down component list thatContentMaster displays (for instructions, see the chapter about Using theIntelliScript Editor in the book ContentMaster Studio in Eclipse).

Example

The following example is a transformer that changes text to upper case.

package MyTransformer;

public class TransformerTest{public static byte[] Transform(byte[] in){

String str = new String(in);String ret = str.toUpperCase();

Page 145: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

134

return ret.getBytes();}}

Basic Propertiesjava_class

The path of the Java class, for example, MyTransformer/TransformerTest .

methodThe method to run, for example, Transform.

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

disabled

optional

LookupTransformer

This transformer looks up a value in a table.

Example

You can configure a LookupTransformer to look up values in a table such as thefollowing:

key value

1 George Washington

2 John Adams

3 Thomas Jefferson

4 James Madison

If the input to the transformer is 3, the transformer outputs the value ThomasJefferson.

Basic Properties

look_atUnder this property, you can define the lookup table. You can select one of thefollowing values:

Page 146: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

135

look_at Explanation

InlineTable Lets you define the table in the IntelliScript.

XMLLookupTable Lets you specify an XML file, which contains the table data.

If you use the same lookup table repeatedly, you should consider defining theInlineTable or the XMLLookupTable component at the global level of theIntelliScript (see Defining a Global Component in the book ContentMaster Studioin Eclipse). You can then reference the table by name in the look_at property.

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

optional

NormalizeClosingTags

For XML input, this transformer removes shorthand closing tags from emptyelements. It changes <tag /> to <tag></tag> .

The transformer does not correct incorrect XML. It converts well-formed XMLfrom one style of closing tag to another.

Advanced PropertiesFor explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

ODBCLookup

The ODBCLookup transformer uses the input text to query a database. It replaces thetext with the query result.

Basic Propertiesdb_connection

The database connection. The value is an ODBC_Text_Connectionsubcomponent, which specifies a DSN, user name, etc.

Advanced Propertiesquery

A SQL SELECT or EXEC query that retrieves the data from the database. Use ? torepresent the input text, for example:

SELECT Name FROM Employees WHERE Id = ?

Page 147: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

136

The query must retrieve a single field, which is the transformer output.

For explanations of the following properties, see Standard Transformer Properties:

disabled

optional

RegularExpression

The RegularExpression transformer does a pattern-match search on the input text.It replaces the found text with a specified string.

Regular ExpressionsThe transformer uses a regular expression to define the search pattern. Regularexpressions are a way to define a search criterion—something like a wildcardsearch, but with greatly enhanced syntax.

For detailed information about regular expressions, see the book Mastering RegularExpressions by Jeffrey E. F. Friedl and Andy Oram (O'Reilly, 1997). You can also getcopious information by searching for regular expressions on the Internet.

ContentMaster uses the Regex++ implementation of regular expressions, copyright(c) 1998-2003 by Dr. John Maddock (Version 3.12, 18 April 2000). For informationabout this implementation, see http://www.boost.org/libs/regex/doc/index.html. For asummary of the regular expression syntax supported by Regex++, seehttp://www.boost.org/libs/regex/doc/syntax.html.

For another use of regular expressions in ContentMaster, see the PatternSearchcomponent in Chapter 7, Anchors.

Example

Suppose you want to replace a string of lower-case letters, beginning and endingwith AA, with the single character Z. To do this, use a RegularExpressiontransformer configured with:

exp = AA[a-z]+AAreplacement = Z

This replaces AAexampleAA with Z .

Basic Propertiesexp

A regular expression for the search criterion.

replacementThe replacement text.

Page 148: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

137

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

Using Parentheses and Parameters to Preserve Portions of the Found TextIn the exp property, you can enclose portions of the regular expression inparentheses. In the replacement property, you can use:

$0 to identify the entire found text that matches the regular expression

$1 to identify the substring that matches the first parenthesized portion of theregular expression

$2 to identify the substring that matches the second parenthesized portion ofthe regular expression

And so forth

For example, suppose you set:

exp = abc([0-9]+)(def)replacement = $1

This replaces abc5624def with 5624.

Alternatively, suppose you set:

exp = abc([0-9]+)(def)replacement = $2ZYX$1

This replaces abc5624def with defZYX5624.

RemoveMarginSpace

This transformer deletes leading and trailing space characters from the text.

Advanced PropertiesFor explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

RemoveRtfFormatting

The transformer removes RTF formatting instructions from the text.

Page 149: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

138

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

RemoveTags

This transformer removes HTML tags from the input text.

It replaces the tags at internal locations of the text with a separator string, such as aspace character. It does not insert the separator string at the beginning or end ofthe text. Adjacent multiple tags are transformed into a single separator.

Basic Properties

replace_withThe separator string.

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

Replace

This transformer finds and replaces strings in the input text. Leaving thereplace_with property empty deletes the found text.

Basic Propertiesfind_what

The text to find. The value is one of the following searcher components (seethe Searcher Component Reference in Chapter 7, Anchors):

NewlineSearch: Finds a newline character.PatternSearch: Finds text that matches a regular expression.SegmentSearch: Finds a segment from a specified opening marker to a

closing marker.TextSearch: Finds a specified string.

replace_withThe replacement string.

Page 150: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

139

Advanced Properties

occurrenceSpecifies which occurrences to replace: all, first, or last.

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

Online Sample

In the ContentMaster Samples folder, openProjects\Transformers_Example\Transformers_Example.cmw. The second and fifthContent anchors in the parser are configured with Replace transformers.

Resize

This transformer fits the input text to a specified size. It pads or truncates the textas required.

Basic Properties

sizeThe desired size. Type an integer, or click the browse button and select a dataholder that contains an integer.

padding_characterThe padding character, for example " " (a space character). Type the character,or click the browse button and select a data holder that contains a character.

alignThe text alignment within the resized string. The options are left (padding ortrimming is on the right) and right (padding or trimming is on the left).

ReverseTransformer

This transformer reverses a string. For example, it transforms 1234 to 4321.

RtfProcessor

This transformer normalizes RTF code. It is also available as a format preprocessor(see Chapter 5, Formats).

Page 151: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

140

RtfToASCII

This transformer converts RTF input to plain text. It removes RTF control wordsfrom the text.

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

SubString

This transformer returns a substring of the input, starting and ending at specifiedlocations.

Basic Properties

beginThe start location (the beginning of the input is 0).

endThe end location.

Advanced PropertiesFor explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

ToBase64Transformer

This transformer converts a binary string to the base64 MIME encoding. This isuseful, for example, when you want to save binary data in XML.

The transformer is supported only on the Microsoft Windows platform.

Page 152: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

141

Advanced Properties

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

ToFloat

This transformer converts a floating point number from an ASCII stringrepresentation to binary.

Advanced Properties

sizeSize of the number: single_precision_32_bit or double_precision_64_bit .

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

ToInteger

This transformer converts an integer from an ASCII string representation (decimal,octal, or hexadecimal) to binary.

Basic Properties

sizeSize in bytes of the binary representation. The supported values are 1 to 8.

Advanced Propertiessigned

If selected, the input has a plus or minus sign.

from_baseThe base of the input: decimal, octal, hexadecimal, lowercase hexadecimal.

Page 153: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

142

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

ToPackDecimal

This transformer converts a number from an ASCII string representation to packeddecimals.

Advanced PropertiesFor explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

ToSignedDecimal

This transformer converts a number from an ASCII string representation to signeddecimals.

Advanced Properties

insert_sign_symbolAdds a sign symbol (plus or minus) before or after the number. The optionsare no, before, and after.

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

TransformByParser

This transformer runs a parser on its input text. The parser should containFindReplaceAnchor components, which mark segments of the text for replacement.When the parser completes execution, the transformer performs the replacements.

The transformer output is the modified text. ContentMaster ignores any XMLoutput that the parser generates.

Page 154: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

143

Example

For an example of this transformer, see the FindReplaceAnchor in Chapter 7,Anchors.

Advanced Properties

parserThe name of the parser.

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

Online Sample

In the ContentMaster Samples folder, openProjects\TransformByParser\TransformByParser.cmw.

The sample uses TransformByParser to replace every instance of the string ~NL~with a carriage return - linefeed sequence.

To execute the sample:

1. Set MyTransformByParser as the startup component.

2. Run the transformer.

3. At the prompt, select the source file Report.edi .

4. The transformer stores its output in Results\Transformation of Report.edi.You can compare the output with the source in Notepad.

Page 155: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

144

TransformerPipeline

This transformer applies a sequence of nested transformers to its input.

Advanced PropertiesFor explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

WestEuroUniToAscii

This transformer converts text in western-European languages from Unicode UTF-16 to the Windows-1252 code page.

XSLTTransformer

This transformer applies an XSLT transformation to XML input text.

Example

You use a parser to extract data from an XML document. A Content anchorretrieves a complete, well-formed branch of the XML tree. You can configure theContent anchor with an XSLTTransformer, which runs an XSLT transformation onthe branch.

Advanced Properties

xslt_fileThe path and filename of the XSLT file.

For explanations of the following properties, see Standard Transformer Properties:

name

remark

disabled

optional

Page 156: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

145

Transformer Subcomponent Reference

This section describes subcomponents that you can assign as the values of certaintransformer properties.

InlineTable

This component lets you define a lookup table in the IntelliScript. The table is usedby the LookupTransformer.

Basic Propertiestable

Under this property, enter a sequence of entry components. In each entry ,specify key and value strings. For example, you might specify:

key value

1 George Washington

2 John Adams

3 Thomas Jefferson

4 James Madison

Advanced Propertiesmatch_case

If selected, the key string is considered to be case-sensitive.

ODBC_Text_Connection

The subcomponent defines a database connection. It is used, for example, in theODBCLookup transformer.

Before using this component, use the operating system tools to define a DSN forthe database connection.

Advanced Properties

DSNThe data source name of the connection.

usernameUser name for the database connection.

passwordPassword of the user.

Page 157: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 8. Transformers

146

timeoutTime in seconds to wait for the database response.

XMLLookupTable

This component lets you specify an XML file, which contains a lookup table. Thetable is used by the LookupTransformer .

The XML file must have the following syntax:

<LookupTable><Entry key="..." value="..." />...

</LookupTable>

ExampleThe following is an XML lookup table:

<LookupTable><Entry key="1" value="George Washington" /><Entry key="2" value="John Adams" /><Entry key="3" value="Thomas Jefferson" /><Entry key="4" value="James Madison" />

</LookupTable>

Basic Properties

xml_file_nameBrowse to the XML file.

Advanced Properties

match_caseIf selected, the key string is considered to be case-sensitive.

Page 158: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

147

Actions

Actions are components that perform operations on data that ContentMaster hasextracted from a source document. Some examples of the supported actions are:

Arithmetic computations

String concatenations

Submitting forms to a web server

Activating a secondary parser

Querying a database

You can use the out-of-the-box actions supplied with ContentMaster, or you candefine custom actions.

This chapter explains how to use actions and documents the actions that areavailable in ContentMaster.

How Actions Work

An action takes its input from the data holders that are currently available. Asingle action can have multiple inputs.

If the action is embedded in a parser, the available data holders are the ones thatthe parser has generated. In a serializer, the data holders are the ones that exist inthe input XML, plus any additional data holders that the serializer has generated.For a mapper, the data holders can be in either the input or the output.

The action performs operations on the input and generates output. You canconfigure many actions to store their output in data holders.

In most actions, the input and output data holders must contain the data directly;that is, they must not contain nested elements. A few actions work with dataholders that contain nested elements, with multiple-occurrence data holders, orwith other special types. These details are explained in the

9

Page 159: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

148

Action Component Reference.

An action can have additional effects, such as writing to a file, updating a database,or submitting data to an external application.

Comparison between Actions and Transformers

Some actions perform operations that are similar to transformers, for example,modifying a string or querying a database. However, actions differ fromtransformers in some fundamental ways. The following table summarizes thedifferences.

Transformers Actions

Input A transformer has a single input,which is a string.

The input is implemented by theaction. For example, an action canhave multiple inputs, which are dataholders.

Output The output of a transformer is astring.

The output is implemented by theaction. For example, an action cancreate output data holders.

Side effects A transformer has no side effects,other than modifying the input string.

An action can have side effects,such as updating a database.

Defining Actions

You can define actions by editing the IntelliScript. You can insert the actions underthe contains line of components such as a Parser, Serializer, Mapper, Group,RepeatingGroup, etc. Essentially, you can insert the actions in any location whereyou can insert anchors, serialization anchors, or mapping anchors.

The actions run in sequence with the anchors that you specify in the same location.In a parser, you can set the phase property of an action, which controls whether itruns in the initial, main, or final stage of the parsing procedure. For an explanationof the phase, see Search Phases in Chapter 7, Anchors.

Standard Action Properties

In this section, we review certain properties that are found in many actions. Foradditional properties that are specific to particular action, see the

Page 160: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

149

Action Component Reference.

nameA name that you assign to the action. ContentMaster includes the name in theevent log. This can help you find an event that was caused by the particularaction.

remarkA comment describing the action.

disabledIf selected, ContentMaster ignores the action. This is useful for testing anddebugging, or for making minor modifications in a project without deletingthe existing actions.

optionalBy default, if an action fails, the parent component (such as a Parser in whichthe action is nested) fails. If you select the optional property, the parentcomponent does not fail.

phaseThe processing phase during which ContentMaster should execute the action(initial, main, or final). This property has an effect only if the action is usedin a parser.

Action Quick Reference

AddEventActionWrites a message in the event log.

AppendListItemsConcatenates a list of strings that are stored in a multiple-occurrence dataholder.

AppendValuesConcatenates strings.

CalculateValuePerforms a computation defined in a JavaScript expression.

CombineValuesGenerates all possible concatenations from multiple-occurrence data holders.

CreateListFills a multiple-occurrence data holder with specified data.

DateAddIncrements a date.

DateDiffComputes the difference between two dates.

Page 161: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

150

DownloadFileDownloads a file.

DownloadFileToDataHolderDownloads the content of a file into a data holder.

DumpValuesA debugging tool for dumping extracted data.

EnsureConditionEvaluates a JavaScript expression. If the expression is false, the action fails.

ExcludeItemsDeletes values from a multiple-occurrence data holder.

ExternalCOMActionRuns a custom action that is implemented as an ActiveX DLL.

JavaScriptFunctionRuns a JavaScript function.

MapCopies a data holder, optionally running transformers on the value.

ODBCActionRuns a database query.

ResetVisitedPagesResets the list of visited pages, permitting repeat visits to a page.

RunMapperRuns a mapper.

RunParserRuns a parser.

RunSerializerRuns a serializer.

SetValueFill a data holder with predefined content.

SubmitFormSubmits an HTML form using the Post method and parses the response.

SubmitFormGetSubmits an HTML form using the Get method and parses the response.

WriteValueWrites a value to a location such as a file, MSMQ queue, or database.

XSLTMapRuns an XSLT transformation.

Page 162: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

151

Action Component Reference

This section documents the actions that are available in ContentMaster.

AddEventAction

This action adds a message to the event log.

Basic Propertiesseverity

The severity level of the message. The options are notification, warning,failure, or fatal error.

messageThe message string.

Advanced PropertiesFor explanations of the following properties, see Standard Action Properties:

name

remark

disabled

phase

AppendListItems

The AppendListItems action concatenates the strings in a multiple-occurrence dataholder.

To prepare the input for this action, see Mapping to Multiple-Occurrence Data Holdersin Chapter 7, Anchors.

ExampleA source document contains the following space-separated text:

H E L L O

When you parse the document, you wish to remove the spaces and store the resultin an XML element called Greeting.

One way to do this is to create a multiple-occurrence variable called VarLetter.Create several Content anchors, which retrieve the individual letters and storethem in occurrences of VarLetter.

Then, use the AppendListItems action to concatenate the occurrences of VarLetterand store the result in the Greeting element. The result is:

<Greeting>HELLO</Greeting>

Page 163: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

152

Basic Properties

inputThe multiple-occurrence data holder (select from a Schema view).

outputA data holder to store the output (select from a Schema view).

Advanced PropertiesFor explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

Online Sample

In the ContentMaster Samples folder, openProjects\AppendListItems\AppendListItems.cmw. The sample uses aRepeatingGroup to store values in a multi-occurrence variable. It then uses as anAppendListItems action to concatenate the values.

AppendValues

The AppendValues action concatenates strings.

ExampleA parser has generated the following XML:

<Name><First>Ron</First><Last>Lehrer</Last>

<Name>

You can configure an AppendValues action that outputs:

<FullName>Ron Lehrer</FullName>

Basic Properties

inputA list of data holders containing the values to be appended (select from aSchema view).

outputA data holder to store the output (select from a Schema view).

Page 164: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

153

Advanced Properties

skip_unfound_valuesIf selected, and one of the input data holders is missing, the action continues.If not selected, the action fails.

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

CalculateValue

The CalculateValue action performs a computation that is defined by a JavaScriptexpression.

For example, you can use the action to compute a sum of numerical values, or toconcatenate string values.

JavaScript SyntaxFor information about the JavaScript syntax that CalculateValue supports, seeEnsureCondition.

ExampleA parser has generated the following XML:

<ItemOrdered><Name>Gizmo</Name><Quantity>100</Quantity><Price>25<Price>

</ItemOrdered>

You can use this action to generate the output:

<ItemOrdered><Name>Gizmo</Name><Quantity>100</Quantity><Price>25<Price><Total>2500</Total>

</ItemOrdered>

To do this, define the Name and Quantity element as input parameters. Specify theJavaScript expression $1 * $2, and store the result in the Total element.

Page 165: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

154

Basic Properties

paramsData holders that contain the input parameters (select from a Schema view).

expressionThe JavaScript expression. Use $1, $2, etc. (up to $9), to represent the inputparameters.

For information about the supported JavaScript syntax, see theEnsureCondition action.

resultA data holder to store the output (select from a Schema view).

Advanced Propertiesfailure_action

The behavior in the event of a failure. The options are Ignore (continue theparser) and HaltExecution (stop the parser).

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

Online Sample

In the ContentMaster Samples folder, openProjects\CalculateValue\CalculateValue.cmw. The sample retrieves threenumbers from a source document and stores them in variables. It uses aCalculateValue action to compute a mathematical function of the numbers.

CombineValues

The CombineValues action generates all possible combinations from lists of strings,which are stored in multiple-occurrence data holders (see Multiple-Occurrence DataHolders in Chapter 6, Data Holders). It concatenates the strings in each combination,generating an output list.

The input of this action must include one or more multiple-occurrence dataholders. Optionally, it may also include single-occurrence data holders.

The output is a multiple-occurrence data holder. Each occurrence of the dataholder stores a combination.

This action is useful, for example, to prepare the VarFormData variable, which isrequired by the SubmitForm action.

Page 166: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

155

Example

In a multiple-occurrence variable called VarDay, you have stored the list Monday,Tuesday. In a multiple-occurrence variable called VarTime, you have storedmorning, afternoon. In a single-occurrence variable called VarSpace, you havestored a space character.

Suppose you run CombineValues on VarDay, VarSpace, and VarTime, with an outputdata holder called DayTime. The output is:

<DayTime>Monday morning</DayTime><DayTime>Monday afternoon</DayTime><DayTime>Tuesday morning</DayTime><DayTime>Tuesday afternoon</DayTime>

Basic Properties

inputFrom a Schema view, select the data holders containing the input. Typically, atleast one of the inputs should be a multiple-occurrence data holder.

outputA multiple-occurrence data holder, where the action stores its output (selectfrom a Schema view).

Advanced PropertiesFor explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

Online Sample

In the ContentMaster Samples folder, openProjects\CombineValues\CombineValues.cmw. The sample retrieves lists of days,months, and years from a source document. It uses a CombineValues action togenerate all possible dates from the lists.

For an additional online sample, see the SubmitForm action.

CreateList

This action inserts data in a list. The output is a multiple-occurrence data holdercontaining the list (see Multiple-Occurrence Data Holders in Chapter 6, Data Holders).

Nested in this component, enter the data values.

Page 167: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

156

Example

If the input data values are

JackJennieLarissa

the action can create the output

<Name><First>Jack</First><First>Jennie</First><First>Larissa</First>

</Name>

Basic Properties

data_holderThe multiple-occurrence data holder, where the action should store the list(select from a Schema view).

Advanced Properties

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

phase

DateAdd

This action increments a date.

Basic Propertiesinput_format

The date format, for example, dd/mm/yy (see DateFormat in Chapter 8,Transformers). You can type the format, or browse to a data holder thatcontains the format. If you omit the format, the system default is assumed.

Page 168: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

157

input_dateThe date to be incremented. You can type the date, or select a data holdercontaining the date from a Schema view.

num_of_daysThe number of days to add. You can type a positive or negative integer, orselect a data holder containing the number from a Schema view.

outputThe data holder to store the output date (select from a Schema view).

Advanced Properties

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

DateDiff

This action computes the difference between two dates.

Basic Propertiesdate_format1, date_format2

The formats of the two dates, for example, dd/mm/yy (see DateFormat inChapter 8, Transformers). You can type the format, or browse to a data holderthat contains the format. If you omit the format, the system default is assumed.

date1, date2The two dates. You can type the date, or select a data holder containing thedate from a Schema view.

outputThe data holder to store the difference, in days (select from a Schema view).

Page 169: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

158

Advanced Properties

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

DownloadFile

This action downloads a file to the local computer. The file can be specifieddynamically, in the source document.

Basic Propertiesfile_url

A data holder that stores the file path or URL (select from a Schema view).

target_pathThe folder path to store the downloaded file. If you leave the property blank,the file is stored in the Results folder of the project.

Advanced Properties

transformersA sequence of transformers that the action applies to the path or URL, beforedownloading.

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

Online SampleIn the ContentMaster Samples folder, openProjects\DownloadFile\DownloadFile.cmw.

To run the sample, you must have an Internet connection. The sample retrieves theURL of a file. It then uses the DownloadFile action to download the file to theResults folder of the project.

Page 170: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

159

DownloadFileToDataHolder

This action downloads a file and stores its content in a data holder.

If the file contains symbols such as < and > , the action converts them to entitiessuch as &lt; and &gt;, in order to generate valid XML.

The file must be located on a web server.

Basic Propertiesfile_url

A data holder that stores the URL of the file (select from a Schema view).

outputThe data holder to store the downloaded content (select from a Schema view).

Advanced PropertiesFor explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

DumpValues

This action is a debugging tool, which writes data to a<DumpValues>...</DumpValues> element.

Nested in the action, insert the data holders that should be dumped.

Advanced Propertiesoutput

The file in which to write the output. The options are ResultFile (the defaultoutput file of the project) or OutputFile (specify a file path).

Page 171: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

160

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

phase

EnsureCondition

This action evaluates a JavaScript expression. If the expression is false , the actionfails.

JavaScript SyntaxInformation about JavaScript syntax is available in many books about webdevelopment. On the Internet, you can find a tutorial introduction on theW3Schools site:

http://www.w3schools.com

A definitive syntax reference for JavaScript is on the ECMA site (ECMAScript is aninternational JavaScript standard):

http://www.ecma-international.org/publications/standards/Ecma-262.htm

The internal ContentMaster JavaScript processor supports standard JavaScriptexpressions containing the following features:

The unary and binary operators:

() + - * / % == != < <= > >= && ||

The ternary ?: operator.

The following methods:

charAtindexOflastIndexOflengthsubstringtoString

If you apply these methods to a literal having a simple data type, you mustenclose the literal in parentheses, for example:

123.toString(); //Wrong(123).toString(); //Right

"Hello, World".substring(3,7); //Wrong("Hello, World").substring(3,7); //Right

Page 172: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

161

The following functions:

Math.ceilMath.floorMath.maxMath.minMath.powMath.sqrtparseFloatparseInt

The internal JavaScript processor does not support features such as the following:

The unary and binary operators:

++ -- typeof void >> >>> << === !== ~ & | ^

Assignment operators:

= += -= *= /= >>= >>>= <<= &= |= ^=

The comma operator ( , ).

The values NaN, null, infinity, or -0 (negative 0).

Data types other than string, number, and boolean.

The Date object.

The equalsIgnoreCase function.

Expressions that use these features can nonetheless be used in some cases. If theinternal JavaScript processor cannot process an expression, the systemautomatically uses an external JavaScript processor, which is supplied withContentMaster, to interpret the expression.

Basic Propertiescondition

A JavaScript expression to be evaluated. In the expression, use $1, $2, etc. (upto $9), to refer to the params. For example, the following expression checkswhether the first parameter has the value Ron Lehrer:

$1 == "Ron Lehrer"

paramsA list of data holders, containing parameters that you can use in the condition(select from a Schema View.)

Page 173: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

162

Advanced Properties

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

phase

ExcludeItems

This action deletes specified values from a multiple-occurrence data holder (seeMultiple-Occurrence Data Holders in Chapter 6, Data Holders).

Nested in the action, specify the values to exclude.

Basic Propertiesdata_holder

The multiple-occurrence data holder (select from a Schema view).

Advanced Properties

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

ExternalCOMAction

The ExternalCOMAction component runs a custom action. The custom action isimplemented as a COM (ActiveX) DLL, or as a .NET DLL with the COMinteroperability (interop) feature.

Because this component uses the Microsoft COM (ActiveX) architecture to activatethe custom action, it runs only on Microsoft Windows platforms.

Creating the Custom DLLYou should program the custom action as a COM component containing thefollowing function:

function Run(ByVal inp as String, ByVal design_mode as Boolean) as String

The inp parameter is the input string, which the action should process. TheExternalCOMAction component passes the input string to the function and receives

Page 174: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

163

the return value as output. The function can have any desired side effects, such asinteracting with a third-party system.

The design_mode parameter is True if the action is activated within ContentMasterStudio. If the custom action requires a long processing time or has side effects thatinterfere while you are designing a parser, the function can perform differentoperations based on the design_mode value.

Register the COM component on the ContentMaster computer. You may then usethe DLL in the ExternalCOMAction component.

Optionally, you can add the custom action to the drop-down component list thatContentMaster displays (for instructions, see the chapter about Using theIntelliScript Editor in the book ContentMaster Studio in Eclipse).

Implementation in Visual Studio 6In Microsoft Visual Studio 6, you should implement the custom action as a COMDLL component that exposes the IDispatch interface (in Visual Basic: as anActiveX DLL project).

Use the regsvr32 utility to register the DLL on the ContentMaster computer.

Implementation in Visual Studio .NETOptionally, you can implement the custom action in Microsoft Visual Studio .NET.ContentMaster must be installed on the development computer.

Create a class library project, which references the file ICMAction.dll in theContentMaster installation folder. Create a class that implements the ICMActioninterface.

Configure the project with the COM interop feature, which lets ContentMasteraccess the DLL as a COM component. For a description of this feature, see the topicIntroduction to COM Interop and other topics under the interop index entry, in theMicrosoft MSDN Library.

Compile the .NET assembly, and copy the DLL to the ContentMaster computer.Use the regasm utility to register the DLL file.

For the convenience of ContentMaster users who may not be familiar Visual Studio.NET, the following is a step-by-step procedure for implementing a custom actionin the C# language. The instructions for other .NET languages, such as Visual Basic.NET, are similar.

1. Open Visual Studio .NET and create a new Class Library project.

2. Add a reference to the file ICMAction.dll.

In the Add Reference window, you can find the reference on the .NET tab(component name ICMAction). Alternatively, you can browse to ICMAction.dllin the ContentMaster installation folder.

3. Add a class that implement the ICMAction interface.

Page 175: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

164

You can copy the following sample code. You should change the namespaceand class names (CMActionExample and CCMActionExample, respectively) tomeaningful names for your projects.

using System;using System.Runtime.InteropServices; //Enables COM interopusing Itemfield.ContentMaster; //For ICMAction interface

namespace CMActionExample{

//Prevents automatic creation of class interface.//Causes class to be exported to COM only as an implementor//of the ICMAction interface[ClassInterface(ClassInterfaceType.None)]

public class CCMActionExample : ICMAction{public CCMActionExample(){}

public string Run(string inp, bool design_mode){//ToDo: Insert code here}

}}

4. Implement the Run function, inserting code that performs the desired action.

For example, suppose you want the custom action to count the characters inthe input, and return the result. The following Run function performs thisaction:

public string Run(string inp, bool design_mode){

Int32 res = inp.Length;return res.ToString();

}

5. In the Solution Explorer, right-click the project and edit its properties.

a. In the left pane of the properties window, expand the tree and selectConfiguration Properties / Build.

b. In the right pane, in the Outputs section, set the Register for COM Interopproperty to true.

Page 176: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

165

6. Right-click the project and choose the Build option.

This generates the DLL file that you need for use in the ExternalCOMActioncomponent.

On the computer where you developed the .NET project, Visual Studio .NETregistered the DLL when you built the project. There are no additional installationsteps.

To run the custom action in ContentMaster on another computer, perform thefollowing steps to install the custom DLL:

1. The Microsoft .NET Framework, version 1.1 or higher, must be installed on thecomputer.

2. Copy your custom DLL to any convenient location on the computer. Therecommended location is the ContentMaster program folder.

3. Open a command prompt, and use the regasm utility to register your DLL. Theutility is located in the Windows folder, in the subfolderMicrosoft.NET\Framework\<version>.

For example, enter the following command:

regasm <path>\YourCustomDLL.dll /codebase

The regasm utility displays a message, indicating that the DLL wassuccessfully registered.

Basic Properties

COMA COMClass component, which defines the COM object implementing thecustom action (see the Action Subcomponent Reference below).

inputA data holder storing the input of the action (select from a Schema view).

outputA data holder where the action should store its output (select from a Schemaview).

Page 177: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

166

Advanced Properties

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

Online SampleFor an online sample of a Visual Studio .NET project, which implements a customaction in the C# language, see the following location in the ContentMasterinstallation folder:

Samples\SDK\CMACTION

JavaScriptFunction

This action executes a JavaScript function, for example, a function located in anHTML source document. You can pass parameters to the function, and you canstore the return value of the function.

Basic Properties

function_to_executeThe name of the function.

resultA data holder, in which to store the return value of the function (select from aSchema view).

paramsA list of data holders containing the input parameters of the function. Theparameters must be in the same order as in the function declaration.

Advanced Propertiesrefresh

If selected, ContentMaster recompiles the function for each page that a parserprocesses. If not selected, ContentMaster assumes that the function is the sameon all the pages, and it compiles the function only on the first page.

Page 178: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

167

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

Map

This action copies a value from one data holder to another.

When copying a simple data holder (one that has no nested elements), the sourceand destination must have compatible data types. The action can applytransformers to the copied value.

Optionally, you can use the action to copy data holders that contain nestedelements. In that case, the source and destination must have identical internalstructures and identical XSD types.

Basic Properties

sourceThe source data holder (select from a Schema view).

targetThe destination data holder (select from a Schema view).

transformersA sequence of transformers that modify the value. Do not assign this propertyif the source and destination are complex XML elements.

Advanced PropertiesFor explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

Online Sample

In the ContentMaster Samples folder, open Projects\CopyValue\CopyValue.cmw.The sample uses a Map action to copy a complex element, which contains anattribute and nested elements.

Page 179: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

168

ODBCAction

This action runs a SQL query on a database. For example, it can perform a SELECTquery that retrieves data, or it can perform an INSERT or UPDATE query that addsdata to the database.

Example

A source document contains an employee ID number, which you have stored in avariable called EmpID. You want to retrieve the employee's name from a databaseand store the result in the following XML structure, which is defined in your XSDschema:

<Person><Name>

<First>...</First><Last>...</Last>

<Name></Person>

To do this, you can run an ODBCAction . You can configure the action as follows:

In this example:

The db_connection property defines the database connection.

The output_record defines the data holder where the action should store theretrieved data.

The sql_statement is the SQL query that retrieves the data.

The input_parameters property contains the EmpID variable, which is the inputof the action.

Basic Properties

db_connectionAn ODBC_XML_Connection subcomponent, which defines the ODBC provider(typically a database).

sql_statementThe SQL query, for example:

SELECT Name FROM Employees WHERE Id = ?

Page 180: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

169

Use the ? symbol to represent an input parameter. If there is more than oneinput parameter, each ? symbol represents the next parameter in sequence, forexample:

SELECT Name FROM Employees WHERE Id = ? AND Gender = ?

In this case, the two ? symbols represent the first and second inputparameters, respectively.

The SQL syntax must be valid for the ODBC provider. Please see the provideror database documentation for details.

Advanced Properties

on_sql_no_dataThe result if the SQL query does not retrieve any data. The value can besuccess (the action does not fail) or fail (the action fails).

output_recordAn XML element, defined in the XSD schema, where the action should storeany data that the SQL query retrieves. The element must contain nestedelements (at the top level of nesting), whose names are identical to the outputfields of the query. Select the element from a Schema view.

If the SQL query retrieves multiple records, the schema should permitmultiple occurrences of the XML element (see Multiple Occurrence Data Holdersin Chapter 6, Data Holders).

retryThe number of retries if the first connection attempt fails.

input_parametersA list of data holders that contain the input parameters (select from an XSDView window Schema view).

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

ResetVisitedPages

This action clears the list of visited pages of specified secondary parsers.

This action is used with the reject_recurring_pages property of a Parsercomponent (see Chapter 3, Parsers). ResetVisitedPages allows multiple visits tothe same page, even if reject_recurring_pages is selected. You might do this, forexample, if you want to post different input data to the same web page.

Page 181: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

170

Basic Properties

parsersSpecify the parsers to be reset.

Advanced PropertiesFor explanations of the following properties, see Standard Action Properties:

name

remark

disabled

phase

RunMapper

This action runs a mapper.

For example, you can use this action in a parser to run a mapper, which modifiesthe parsed data. The mapper runs on the XML output of the parser.

Basic Properties

mapperThe mapper. You may select the name of an existing Mapper component, oryou may create a Mapper component at this location of the IntelliScript (seeChapter 10, Serializers).

Advanced PropertiesFor explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

RunParser

This action activates a parser, which parses a dynamically specified source.

In a parser, for example, you can use this action, to follow the links in an HTMLfile and run a secondary parser on the link destinations.

In a serializer, you can use this action to parse bits of unstructured data that existin the input.

The output of RunParser is appended to the output of the main component thatactivated it (such as a parser or serializer).

Page 182: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

171

The RunParser action differs from the EmbeddedParser anchor, in that RunParserparses a new source, whereas EmbeddedParser parses a section of an existingsource.

See also the SubmitForm and SubmitFormGet actions, which submit HTML form datato a URL.

Example

An HTML file has a link to a second file. A Content anchor stores the file path orURL of the link destination in the VarLinkURL system variable. The RunParseraction accesses the destination file and runs a secondary parser on it.

For another example, where the main parser selects which parser to run accordingto text in the source document, see the Alternatives anchor in Chapter 7, Anchors.

Posting Data to the URL

Optionally, the action can post data to the URL. This simulates the submission ofan HTML form to a web server, and it parses the result.

To do this, you must store the data in the VarPostData system variable. To correctlyprepare the data in VarPostData, we recommend the following procedure:

1. Save a copy of the HTML page containing the form on your local computer.

2. Edit the copy, changing the form action attribute to your email address. Forexample, if the form element reads <form method="POST"action="http://example.com/MyServer.exe">, change it to <formmethod="POST" action="mailto:[email protected]">.

3. Open the copy in your browser, fill in the form, and click the submit button.This sends an email containing the form data to your address.

4. The body of the email is a string containing the form data. Assign this string tothe VarPostData variable.

Basic Properties

next_parserThe name of the parser to run (recursive calls to the same parser arepermitted).

Advanced Propertiesinput_source_as_text

This property specifies the type of data that the input_source data holdercontains.

If input_source_as_text is selected, input_source contains a text string thatshould be parsed. If not selected, input_source contains a file path or a URL.

input_sourceIf input_source_as_text is selected, input_source is a data holder thatcontains a string to be parsed.

Page 183: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

172

If input_source_as_text is not selected, input_source is a data holdercontaining the path or URL of the document to be parsed. The default value isthe VarLinkURL system variable.

If the VarPostData system variable contains a value, the value is posted to theURL. If VarPostData is empty, the action accesses the URL without postingany data.

pre_processorA document processor that the parser should apply to the source.

retriesThe number of times to retry if the request fails.

seconds_to_waitThe interval in seconds between retries.

include_stringsStrings that must be present in the input_source value. If a specified string isnot present, the action does not access the source or activate the secondaryparser. For example, if you want to follow links only with the same web site,you might add the domain name to include_strings.

exclude_stringsStrings that must not be present in the input_source. If a string is present, theaction does not access the source or activate the secondary parser.

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

RunSerializer

This action runs a serializer. The output of the serializer is stored in a data holder.

For example, you can use this action in a parser to run a serializer, which modifiesthe parsed data.

Basic Propertiesserializer

The serializer. You may select the name of an existing Serializer component,or you may create a Serializer component at this location of the IntelliScript(see Chapter 10, Serializers).

outputA data holder to store the serializer output (select from a Schema view).

Page 184: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

173

Advanced Properties

inputA data holder storing XML text on which to run the serializer (select from aSchema view).

If you omit this property, the serializer uses the data holders available in thescope of the action. For example, if the action is nested in a parser, theserializer runs on the output of the parser. If the action is within a Group, itruns on the output of the Group.

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

Online Sample

In the ContentMaster Samples folder, openProjects\RunSerializer\RunSerializer.cmw.

To observe how the sample works, set MainParser as the startup component andrun it. MainParser contains a RepeatingGroup, which parses pairs of names andstores them in variables. After each iteration, the RepeatingGroup executes aRunSerializer action, which concatenates the variables with some predefined text.The action stores its output in an XML element, which is added to the parseroutput.

SetValue

This action fills a data holder with predefined content.

The assignment overwrites any existing content (except for a multiple-occurrencedata holder, see Multiple-Occurrence Data Holders in Chapter 6, Data Holders).

Basic Properties

quoteThe content to assign.

data_holderThe data holder (select from a Schema View).

Advanced Properties

transformersA list of transformers that are applied to the content.

Page 185: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

174

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

phase

SubmitForm

This action submits HTML form data to a URL and parses the response.

SubmitForm uses the HTTP Post method to submit the form. To use the HTTP Getmethod, use the SubmitFormGet action, instead. See also the RunParser action,which can submit an HTML form.

The output of SubmitForm is appended to the output of the main component thatactivated it (such as a parser or serializer).

SubmitForm is an alternative to using the HtmlForm anchor. HtmlForm is easier to usebecause it performs some of the data-preparation steps automatically. SubmitFormgives you greater control because it lets you configure these steps yourself.

Using SubmitForm

To use the SubmitForm action, you should:

1. Store the URL to which you want to submit the form (the action attribute ofan HTML <form> element) in the VarFormAction system variable.

2. Store the form data in the VarFormData system variable. You can determine thecorrect format of the data in the following way:

a. Save a copy of the HTML page containing the form on your local computer.b. Edit the copy, changing the form action attribute to your email address. For

example, if the form element reads <form method="POST"action="http://example.com/MyServer.exe">, change it to <formmethod="POST" action="mailto:[email protected]">.

c. Open the copy in your browser, fill in the form, and click the submit button.This sends an email containing the form data to your address.

d. The body of the email is a string containing the form data. Assign this stringto the VarFormData variable.

3. Run the SubmitForm action. The action submits the data that you stored inVarFormData to the location that you stored in VarFormAction.

Submitting Multiple Copies of a Form

VarFormData is a multiple-occurrence variable (see Multiple-Occurrence Data Holdersin Chapter 6, Data Holders). This means that you can create multiple occurrences ofVarFormData, each storing a different set of post data.

Page 186: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

175

If you do this, SubmitForm posts each occurrence of VarFormData independently,and it parses each of the web-server responses.

You can use the CombineValues action to prepare the VarFormData occurrences. Forexample, if you know the possible values of each form field, CombineValues canprepare all possible combinations of the values.

Basic Properties

actionSelect OpenURL, which specifies how to parse the web-server response (see theOpenURL subcomponent in the Action Subcomponent Reference).

Advanced PropertiesFor explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

Online Sample

In the ContentMaster Samples folder, open Projects\SubmitForm\SubmitForm.cmw.The sample works in the following way:

1. The main parser, which is called Flower_form_parser, retrieves options fromthe HTML order form of an online florist. The options include several flowertypes and price ranges.

2. The parser runs a CombineValues action, which prepares all possiblecombinations of the flower-type and price-range options.

3. The parser runs a SubmitForm action, which posts the combinations to an webapplication.

4. The SubmitForm action activates a secondary parser, which parses theresponses from the web application. The parsing output is added to the outputof the main parser.

You cannot run this sample because the web application does not exist.

SubmitFormGet

This action submits HTML form data to a URL and parses the response.

SubmitFormGet is identical to SubmitForm, except that it uses the HTTP Get methodinstead of Post. For details, see SubmitForm.

Page 187: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

176

WriteValue

This action writes the value of a data holder to a location such as a file, a database,or an MSMQ queue.

If the data holder is an XML element, the action writes both the element and anynested elements.

Basic Properties

inputThe data holder to write (select from a Schema view).

outputThe output location. The options are listed in the following table (for details,see the Action Subcomponent Reference).

output Explanation

MSMQOutput Writes to an MSMQ queue.

OutputDataHolder Writes to a data holder.

OutputFile Writes to a file.

ResultFile Writes to the default results file of thedata transformation.

OutputCOM Lets you use a custom COMcomponent to output the data.Do not select this option directly.Instead, select the display name ofthe custom COM component. Forinstructions, see OutputCOM.

ExternalOutputWriter Lets you use a custom component tooutput the data. For details, pleasecontact SAP support.

Advanced Propertiesno_tags

By default, the action surround the value that it writes with XML tags.

If you select no_tags , the XML tags are omitted. This is appropriate only ifinput is a simple data holder, containing no nested elements or attributes.

transformersA list of transformers that modify the value before writing. The input to thetransformers is the complete input data holder, including XML tags.

Page 188: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

177

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

Online Samples

In the ContentMaster Samples folder, open Projects\Splitter\Splitter.cmw.

The sample demonstrates how to split a file into two files. A parser uses aRepeatingGroup to retrieve the records of an HL7 file. It uses a Map action to createunique filenames for each record, and a WriteValue action to write the records tothe files. You can find the output files (MyOutput1.txt and MyOutput2.txt) in theResults folder of the project.

A practical use of the splitting technique is in sending large messages to an MSMQqueue for subsequent processing by Microsoft BizTalk Server. To avoid exceedingthe MSMQ and BizTalk size limits, you can split the messages using aRepeatingGroup and WriteValue. For a sample, seeProjects\BizTalkSplitter\BizTalkSplitter.cmw.

Before you run the BizTalkSplitter example, use the MSMQ administration toolsto create a queue where the parser will store its output. Edit the output property ofthe WriteValue action, inserting your queue path.

XSLTMap

This action runs an XSLT transformation.

The input and output are branches of an XML document (either the outputdocument of a parser or the input document of a serializer).

ExampleSuppose that the following XML is the result of a parser:

<Person><First>Ron</First><Last>Lehrer</Last>

</Person>

You can use the XSLTMap action, with an appropriate XSLT file, to convert this to:

<Person Name="Lehrer, Ron" />

Page 189: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

178

Basic Properties

inputThe XML element at the root of the branch to be transformed (select from aSchema view).

outputThe XML element at the root of the branch that should store the output (selectfrom a Schema view).

xslt_fileBrowse to the XSLT file.

Advanced Properties

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

phase

Action Subcomponent Reference

This section describes subcomponents that you can assign as the values of certainaction properties.

COMClass

This subcomponent is used, for example in an ExternalCOMAction , to define acustom COM component.

Basic PropertiesProgID

The ProgID of the COM component.

If you developed the custom action in Visual Basic 6, the ProgID typically hasthe form dll_name.class. If you used Visual Studio .NET with the COMinteroperability option, the ProgID has the form namespace.class.

For example, if you developed a .NET namespace with the nameCMActionExample, and the class name is CCMActionExample, the ProgID isCMActionExample.CCMActionExample.

Page 190: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

179

Advanced Properties

thread_safeDeselect this option if the COM component is incompatible withmultithreading. This causes ContentMaster to synchronize calls to thecomponent (at the cost of slower performance).

MSMQOutput

This subcomponent specifies that a stream should be written to an MSMQ messageand sent to a queue.

The subcomponent is used in the WriteValue action to specify the output location.

Basic Properties

output_idThe identifier (such as a path) of the MSMQ queue. You may type theidentifier, or click the Browse button and select a data holder that contains theidentifier.

Advanced Properties

appendThis property is not in use.

For explanations of the following properties, see Standard Action Properties:

name

remark

ODBC_XML_Connection

The subcomponent defines a database connection. It is used, for example, in anODBCAction.

Before using this component, use the operating system tools to define a DSN forthe database connection.

Advanced PropertiesDSN

The data source name of the connection.

usernameUser name for the database connection.

passwordPassword of the user.

timeoutTime in seconds to wait for the database response.

Page 191: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

180

OpenURL

This subcomponent is used within the SubmitForm and SubmitFormGet actions, tospecify how to parse a web-server response.

Basic Propertiesnext_parser

The name of a parser to run on the web-server response.

Advanced Propertiesretries

The number of retries if the first request fails.

seconds_to_waitThe interval in seconds between retries.

For explanations of the following properties, see Standard Action Properties:

name

remark

disabled

optional

OutputCOM

The OutputCOM option of the WriteValue action lets you use a custom COMcomponent to create output from ContentMaster. The component can perform anydesired operations: modifying the data, writing the data to multiple locations,interacting with an information system, etc.

Because OutputCOM uses the Microsoft COM technology, it operates only onMicrosoft Windows systems.

Note that OutputCOM works differently from other ContentMaster components. It isa template for a custom component, and not a component that you can usedirectly. You cannot configure a WriteValue action with the OutputCOM option, andnest a custom component within OutputCOM. Instead, you must program a customCOM component, add it to the drop-down list, and select its name in the outputproperty of WriteValue. The following paragraphs explain the procedure.

Page 192: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

181

Programming the Custom COM Component

To create the custom COM component, program an ActiveX DLL containing thefollowing function:

Public Function process_output( _ByVal output_id As String, _ByVal outContent As String, _ByVal mode As String) _As String

The WriteValue action passes the following parameters to the function:

output_idAn identifier for the desired output location (the value of the output_idproperty of the OutputCOM component).

outContentThe content that the WriteValue action is outputting.

modeIf the append property of the component is not selected in the IntelliScript, mode= "CREATE". If the append property is selected, mode = "APPEND".

The function can perform any desired operations. ContentMaster ignores thereturn value of the function.

Install and register the DLL on the ContentMaster computer.

Adding the Custom COM Component to the Drop-Down List

You must add the custom component to the drop-down list that ContentMasterdisplays under the WriteValue action. To do this:

1. In Notepad, create a text file.

2. Type a line such as the following in the file:

profile DisplayName ofPT OutputCOMT("MyProject.MyClass")

Here, DisplayName is the name that you would like to display in the drop-down list, and MyProject.MyClass is the ProgID of the component.

3. Save the file with an extension of *.tgp (for example, MyClass.tgp), in thefolder ContentMaster4/AutoInclude/User.

Configuring WriteValue to Use the Custom COM ComponentTo use the custom COM component:

1. Configure a WriteValue action.

2. In the output property of the action, select the DisplayName that youconfigured above.

3. Assign the output_id and append properties of the DisplayName component.

Page 193: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

182

You can then run the ContentMaster data transformation. The WriteValue actionactivates the custom component.

Basic Propertiesoutput_id

An identifier for the location where the custom COM component should writeits output. You can use this parameter, for example, to pass the name of anoutput file to the custom component.

You can type an identifier, or you can select a data holder that contains theidentifier.

Advanced Propertiesappend

If selected, the custom COM component should append its output to theexisting content of the output location (instead of overwriting).

For explanations of the following properties, see Standard Action Properties:

name

remark

OutputDataHolder

This subcomponent specifies how to write a stream to a data holder. Thesubcomponent is used in the WriteValue action to specify the output location.

Basic Propertiesdata_holder

The data holder (select from a Schema view).

Advanced Propertiestransformers

A sequence of transformers that modify the stream before writing.

Page 194: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 9. Actions

183

For explanations of the following properties, see Standard Action Properties:

name

remark

OutputFile

This subcomponent specifies that a stream should be written to a file. The file canbe specified dynamically, for example by extracting the path from the sourcedocument.

The subcomponent is used in the DumpValues and WriteValue actions to specify theoutput location.

Basic Propertiesfile

The filename, optionally including a path. You may type the name, or click theBrowse button and select a data holder that contains the name.

The path can be absolute or relative. In the latter case, ContentMaster resolvesthe path relative to the output folder of the request (if you run the projectwithin ContentMaster Studio, with respect to the project Results folder).

Advanced Properties

appendIf selected, the data is appended to the existing content of the file (instead ofoverwriting).

For explanations of the following properties, see Standard Action Properties:

name

remark

ResultFile

This subcomponent specifies that a stream should be written to the normal outputfile of a project.

The subcomponent is used in the DumpValues and WriteValue actions to specify theoutput location.

Page 195: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

184

Serializers

Serialization is the opposite of parsing. A parser converts a source document,which can be in any format, to an XML file. A serializer converts an XML file to anoutput document that can be in any format. The output of a serializer, for example,can be a text, HTML, or HL7 document, or even another XML document.

You can create a serializer from an existing parser, by instructing ContentMaster toinvert the parser configuration. Alternatively, you can create a serializer by usingthe New Serializer wizard and by editing the IntelliScript.

You can also use a combination of these methods: you can create a serializer from aparser, and then edit the serializer configuration in the IntelliScript.

By any method, it is usually easier to create a serializer than a parser. This isbecause the XML input is completely structured. The structure makes it easy toidentify the required data and write it, in a sequential procedure, to the output. Aparser, in contrast, may need to process unstructured or semi-structured input—atask that can be much more complex than serialization.

The main components that are nested in a serializer are called serialization anchors.The function of the serialization anchors is to identify the XML data and write it tothe output. Serialization anchors are analogous to the anchors that are used in aparser (which are called simply anchors), except that they work in the oppositedirection.

This chapter explains the procedures for creating serializers, and it describes theserialization anchors.

Creating a Serializer from a Parser

To create a serializer automatically from a parser:

1. In ContentMaster Studio, open a project that contains an existing parser.

2. Right-click the parser name in the IntelliScript, and choose Create Serializerfrom the pop-up menu.

10

Page 196: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

185

ContentMaster notifies you that the serializer has been successfully created,and it displays the serializer in the IntelliScript.

The name of the serializer is derived from that of the parser, with the suffix_serializer. For example, if you create a serializer from Parser1, the serializeris called Parser1_serializer.

3. ContentMaster stores the serializer in a new TGP script file, which has a namesuch as Parser1_auto_generated_serializer.tgp. Use the ContentMasterExplorer of the Component view to open the new serializer file in anIntelliScript editor.

4. Test the serializer (see Running a Serializer below), and edit the IntelliScript ifrequired (see Troubleshooting an Auto-Generated Serializer).

Online SamplesIn the ContentMaster Samples folder, openProjects\Serialization\TabDelimited\TabDelimited.cmw.

The sample demonstrates a full parser/serializer cycle, using an auto-generatedserializer. You can run the sample in the following way:

1. Set MyHL7Parser as the startup component, and run it. This generates anoutput file Results\output.xml.

Page 197: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

186

2. Now set MyHL7Parser_serializer as the startup component, and run it. At theprompt, browse to Results\output.xml as the input. The original input file isregenerated.

A variant of this project is in Projects\Serialization\HL7\HL7.cmw. You cangenerate the serializer yourself and try the above experiment.

Controlling How the Create Serializer Command Works

When you run the Create Serializer command, ContentMaster converts the Contentanchors of the parser to ContentSerializer serialization anchors.

By default, the command converts all other text in the example source toStringSerializer serialization anchors. Assuming that the other text containsboilerplate text, this means that the output of the serializer contains all theboilerplate that was in the original example source.

For example, suppose the parser runs on tab-delimited source documents havingthe following structure:

Name (first and last):<tab>Ron Lehrer

Assume that the anchors are defined in the following way:

Source text Anchor

Name Marker

(first and last):<tab> (not marked as an anchor)

Ron Lehrer Content

The XML output of the parser is:

<FullName>Ron Lehrer<FullName>

Now, suppose that you generate a serializer from this parser, and you run theserializer on the following input:

<FullName>Larissa Chan<FullName>

The output of the serializer is:

Name (first and last):<tab>Larissa Chan

Serialization Mode

The example source may contain text that you don't want in the serializer output.In that case, you can modify the behavior of the Create Serializer command, in away that does not generate the StringSerializer serialization anchors.

To do this, set the serialization_mode property of the Parser component (seeChapter 3, Parsers). The possible values of the serialization_mode are:

Page 198: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

187

FullThe Create Serializer command copies the non-XML text to the serializerconfiguration.

OutlineThe Create Serializer command copies only the delimiters of the non-XML textto the serializer configuration.

Under the Outline option, you can select the use_markers option. This causesthe Create Serializer command to copy the content of the Marker anchors butonly the delimiters of other non-XML text.

The following table illustrates the results of the serialization_mode settings.

serialization_modeproperty of theparser:

The Create Serializer command converts: Sample serializeroutput:

outlineWith use_markersnot selected

Content anchors to ContentSerializerserialization anchorsThe delimiters of other text in the examplesource to StringSerializer serializationanchors

<tab>Larissa Chan

outlineWith use_markersselected

Content anchors to ContentSerializerserialization anchorsThe complete text of Marker anchors toStringSerializer serialization anchorsThe delimiters of other text in the examplesource to StringSerializer serializationanchors

Name<tab>LarissaChan

full (the default) Content anchors to ContentSerializerserialization anchorsAll other text in the example source toStringSerializer serialization anchors

Name (first andlast):<tab>LarissaChan

Of course, you can edit the auto-generated serializer configuration to furthermodify the output.

Troubleshooting an Auto-Generated Serializer

Often, you can use an automatically generated serializer directly. In some cases,you may need to edit the serializer configuration for it to work correctly.

The following paragraphs list some typical circumstances under which you need toedit the serializer, and the suggested editing steps.

Page 199: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

188

Root Tag

On the XML Generation tab of the project properties, there is an option to Add anXML Root Element. The effect of this option is to nest the parser output in aspecified root element (see XML Generation Page in Chapter 13, Project Properties).

If this option is selected, and you try to run an auto-generated serializer on theparser output, it cannot find the input XML elements because of the nesting.

The solution is to set the root_tag property of the serializer to the same value as inthe project properties. The serializer then finds its input nested under the root.

Parser is configured to add XML root element

Assign the root_tag property of the serializer

Page 200: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

189

Variables

If the parser uses a variable to store intermediate results, an auto-generatedserializer may fail. To solve the problem, review the serializer logic, and removethe variable if necessary.

Additional Components

The Create Serializer command inverts the anchors of a parser. It does not invertcomponents such as document processors, transformers, or actions.

For example, suppose that a parser uses a PdfToTxt_3_00 document processor toconvert PDF source documents to text. The parser contains anchors that transformthe text to XML.

The auto-generated serializer transforms the XML back to text. It does not convertthe text to PDF. You can obtain PDF output by submitting the text to the AdobeAcrobat Distiller or to any other PDF generation utility.

In another example, suppose that a parser uses an AddString transformer to add aprefix to the output of a Content anchor. The auto-generated serializer does notremove the prefix. If you need to remove it, you can edit the serializer and insert acomponent such as a Replace transformer.

Creating a Serializer by Using the New SerializerWizard

You can use the New Serializer wizard to create a serializer. The procedure isanalogous to that for creating a parser (see Using the New Parser Wizard inChapter 3, Parsers).

Opening the WizardTo create a new project that contains a serializer, choose File > New > Project onthe ContentMaster Studio menu. In the left pane of the New Project window, selectContentMaster. In the right pane, select Serializer Project.

To create a new serializer in an existing project, choose File > New > Serializer onthe menu.

Wizard Options

The New Serializer wizard prompts you for options such as the following:

Serializer nameA name for the serializer.

Script nameA name for a TGP script file, where the wizard stores the serializer definition.

Page 201: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

190

Schema file path(Optional) The name of an XSD schema, which defines the data holders wherethe parser will store its output.

If you omit this step, you can add an XSD schema to the project afterwards, byright-clicking in the ContentMaster Explorer view.

Completing the Serializer Configuration

After you have entered the wizard options, click Finish to create the serializer. Theserializer is displayed in the ContentMaster Explorer view and the Componentview of ContentMaster Studio.

To complete the serializer configuration:

1. Display the serializer in the IntelliScript editor.

2. Under the contains line, add a sequence of serialization anchors and actions.

3. Run and test the serializer (see Running a Serializer below), and modify theIntelliScript as required.

Creating a Serializer by Editing the IntelliScript

Like all other ContentMaster components, you can create a serializer by editing theIntelliScript directly.

1. Add an XSD file to the project, which defines the schema of the XML files thatyou want to serialize (see Chapter 6, Data Holders).

2. At the global (top) level of the IntelliScript, add a Serializer component.

3. Edit the properties of the Serializer as required (see the Serializer ComponentReference below).

4. Nested within the Serializer, add a sequence of serialization anchors (seeDefining Serialization Anchors). Optionally, you may also add actions.

5. Test the serializer (see Running a Serializer below), and modify the IntelliScriptif required.

Online SampleIn the ContentMaster Samples folder, openProjects\ManualSerializer\ManualSerializer.cmw. The sample illustrates aserializer that was created by editing the IntelliScript. You can run the serializer onthe input file Example XML of Person.xml .

Page 202: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

191

Creating a Serializer within a RunSerializer Action

In addition to defining a serializer at the global level, it is possible to define aserializer within a RunSerializer action. For details, see Chapter 9, Actions.

Running a Serializer

To run a serializer in ContentMaster Studio:

1. Set the serializer as the startup component.

2. On the menu, choose Run > Run.

3. You are prompted to open the input XML file.

If you created the serializer from a parser, a convenient test file is the parseroutput. Browse to the output file (by default, Results\output.xml in theproject folder).

4. When the execution is complete, ContentMaster Studio displays the Eventsview. Examine the events for any failures or warnings.

5. To view the serialization results, open the results file, which is located in theResults folder of the project.

Serialization Anchors

The main components that you can use in a serializer are called serialization anchors.These are analogous to the anchors that are used in a parser (which are calledsimply anchors), except that they work in the opposite direction. Anchors read datafrom locations in the source document and write the data to XML. Serializationanchors read XML data and write the data to locations in the output document.

Please note that a serialization anchor is not an anchor, despite their similar names.You cannot use anchors in a serializer, and you cannot use serialization anchors in aparser.

The most important serialization anchors are called ContentSerializer andStringSerializer. A ContentSerializer writes the content of a specified dataholder to the output document; it is the inverse of a Content anchor, which readscontent from a source document. A StringSerializer writes a predefined string tothe output; it is the inverse of a Marker anchor, which finds a predefined string in asource document.

Page 203: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

192

Example of Serialization Anchors

The following example illustrates three serialization anchors of a serializer.

The first anchor is a StringSerializer, which instructs the serializer to write thefollowing text in the output document:

First Name<tab>

The second anchor is a ContentSerializer. The anchor seeks the value of thePerson/Name/First element in the XML, and writes the value in the output.

The third anchor is a StringSerializer, which writes the string:

<newline>Last Name:<tab>

As always, the IntelliScript represents the newline and tab using ASCII codes and «,respectively. To edit the special characters, see Using the IntelliScript Editor in the bookContentMaster Studio in Eclipse.

Now, assume that you run the serializer on the following XML:

<Person gender="M"><Name>

<First>Ron</First><Last>Lehrer</Last>

</Name><Id>547329876</Id><Age>27</Age>

</Person>

From the illustrated serialization anchors, the output is:

First Name<tab>Ron<newline>Last Name<tab>

Page 204: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

193

The display of this text is:

Of course, the serializer contains additional serialization anchors, which are notshown in the above illustration. The complete output of the serializer is:

Defining Serialization Anchors

To define serialization anchors, edit the IntelliScript under a Serializer component.

Sequence of Serialization Anchors

A serializer executes the serialization anchors in the sequence of their definitions.

Serialization anchors write data sequentially, always appending it to the end of theoutput document. Of course, you can alter the order by changing the sequence inthe serializer configuration.

You can intersperse actions with the serialization anchors. The actions are executedas part of the sequence.

Page 205: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

194

Standard Serializer Properties

In this section, we review certain properties that are found in the Serializercomponent and in many serialization anchors. For additional properties that arespecific to particular components, see the Serializer Component Reference andSerialization Anchor Component Reference.

nameA name that you assign to the component. ContentMaster includes the namein the event log. This can help you find an event that was caused by theparticular component.

remarkA comment describing the component.

disabledIf selected, ContentMaster ignores the component. This is useful for testingand debugging, or for making minor modifications in a project withoutdeleting the existing components.

optionalBy default, if a component fails, the parent component (such as a Serializerin which a serialization anchor is nested) fails. If you select the optionalproperty, the parent component does not fail.

Serializer Quick Reference

The main serialization component is:

SerializerThe main component of a serializer, which converts XML to an outputdocument format.

Within a Serializer component, you can nest the following serialization anchors:

AlternativeSerializersSpecifies alternative serialization anchors that may be appropriate, dependingon the structure of the XML.

ContentSerializerSerializes XML data and writes it to the output document.

DelimitedSectionsSerializerSerializes sections of data, writing a separator string between them.

EmbeddedSerializerRuns a secondary serializer.

Page 206: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

195

GroupSerializerBinds a set of serialization anchors together for processing as a unit.

RepeatingGroupSerializerCreates a repetitive structure in the output document.

StringSerializerWrites a specified string to the output document.

Serializer Component Reference

This section documents the top-level Serializer component. For serializationanchors, see the Serialization Anchor Component Reference.

Serializer

A Serializer converts XML documents to output documents in any format.

Advanced Propertiesoutput_file_extension

The file extension of the generated output file, including the leading period,for example:

.txt

root_tagThe name of a root XML element, which is not in the XSD schema of theproject.

For example, if the top-level element of the schema is Person, but the XMLinput nests Person in an element called ContentMaster, enter root_tag =ContentMaster.

default_transformersA list of transformers that the Serializer applies to all serialized data.

The following properties are useful in situations where the serializer must selectspecific occurrences of data holders. For an explanation, see Chapter 12, Locators,Keys, and Indexing.

source

target

For explanations of the following properties, see Standard Serializer Properties :

name

remark

Page 207: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

196

Serialization Anchor Component Reference

This section describes the serialization anchors, which you can use in a Serializer.

AlternativeSerializers

This serialization anchor lets you define a set of alternative, nested serializationanchors. You can define a criterion for which alternative the parser should accept.Only the accepted alternative affects the serializer output. The other serializationanchors (whether failed or successful) have no effect on the serializer output.

Example

The input XML may contain a Product element or a Service element, but not both.You wish to serialize whichever element is in the input.

In an AlternativeSerializers serialization anchor, and set its selector propertyto ScriptOrder.

Within the AlternativeSerializers, nest two ContentSerializer serializationanchors. Configure one of them to process the Product element, and the other toprocess Service.

Basic Properties

selectorThe criterion for deciding which alternative to accept. The options are:

selector property Explanation

ScriptOrder ContentMaster tests the nested serialization anchors in thesequence that they are defined in the IntelliScript. It accepts thefirst one that succeeds.If all the nested serialization anchors fail, theAlternativeSerializers component fails.

NameSwitch ContentMaster searches for the nested serialization anchor whosename property is specified in a data holder (select from a Schemaview). It ignores the other nested serialization anchors.If the named serialization anchor fails, theAlternativeSerializers component fails.

Page 208: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

197

Advanced Properties

For explanations of the following properties, see Standard Serializer Properties :

name

remark

disabled

optional

ContentSerializer

This serialization anchor writes the serialized data to the output document.

Basic Propertiesopening_str

A string that the anchor should write before the data_holder.

closing_strA string that the anchor should write after the data_holder.

data_holderThe data holder containing the data (select from a Schema view).

Advanced Properties

allow_empty_valuesIf selected, the data_holder can be empty. If not selected, and the data_holderis empty, the ContentSerializer fails.

ignore_default_transformersIf selected, the default transformers of the Serializer are not applied to theserialized data.

transformersA list of transformers that are applied to the serialized data.

For explanations of the following properties, see Standard Serializer Properties :

name

remark

disabled

optional

DelimitedSectionsSerializer

This serialization anchor processes sections of data. Between each section of theoutput, the DelimitedSectionsSerializer writes a separator string.

Page 209: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

198

Within the DelimitedSectionsSerializer, you should nest other serializationanchors. Each nested serialization anchor is responsible for outputting a singlesection.

ExampleThe XML input contains an employee resume. You wish to write the data to anoutput text document in the following format:

----------------------------Jane PalmerEmployee ID 123456----------------------------Professional Experience...----------------------------Education...

You can define a DelimitedSectionsSerializer, with the line of hyphens as theseparator. Because you want a line of hyphens before each section, setseparator_position = before.

Within the DelimitedSectionsSerializer, nest three GroupSerializercomponents. The first GroupSerializer writes the Jane Palmer section, the secondwrites the Professional Experience sections, and so forth.

Optional SectionsIn the above example, suppose that the second section, Professional Experience,is missing from some input XML documents. Nevertheless, you want to write itsseparator (the line of hyphens).

----------------------------Jane PalmerEmployee ID 123456--------------------------------------------------------Education...

To handle this situation, you should configure the DelimitedSectionsSerializerin the following way:

In the second GroupSerializer, select the optional property. This means thatif the GroupSerializer fails, it should not cause theDelimitedSectionsSerializer to fail.

In the DelimitedSectionsSerializer, set using_placeholders = always. Thismeans to write the separator of an optional section, even if the section itself ismissing.

Page 210: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

199

Now suppose that if the Professional Experience section is missing, you do notwant to write its separator:

----------------------------Jane PalmerEmployee ID 123456----------------------------Education...

In this case, you should configure the DelimitedSectionsSerializer as follows:

In the second GroupSerializer, select the optional property.

In the DelimitedSectionsSerializer, set using_placeholders = never. Thismeans not to write the separator of a missing section.

Basic Propertiesseparator_position

Position of the separator relative to the sections.

The following table explains the values. The examples assume that theseparator is a vertical-line character ( | ).

separator_position Explanation Example

before Write a separator before each section (includingthe first section).

|1|2|3|4

after Write a separator after each section (includingthe last section).

1|2|3|4|

between Write a separator between the successivesections (not before the first section and not afterthe last section).

1|2|3|4

around Write separators before and after each section(including the first and last sections).

|1|2|3|4|

using_placeholdersThis property specifies whether the DelimitedSectionsSerializer shouldwrite the separator of an optional section that is missing from the XML input.

The following table explains the values. The examples assume that the basicoutput structure is|1|2|3|4. The examples illustrate the output if sections 2and 4 are missing.

Page 211: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

200

using_placeholders Explanation Example

always Always write the separator of a missing section. |1||3|

never Never write the separator of a missing section. |1|3

when necessary Always write the separator of a missing internalsection. Never write the separator of a missingterminal section.

|1||3

separatorThe separator string.

Advanced PropertiesFor explanations of the following properties, see Standard Serializer Properties :

name

remark

disabled

optional

EmbeddedSerializer

This serialization anchor activates a secondary Serializer, which writes its outputin the same output document.

Example

The XML input is a family tree. The input contains Person elements, which arerecursively nested as shown:

<Person> <!-- Parent -->...<Person> <!-- Child -->

...<Person> <!-- Grandchild -->

...</Person>

</Person></Person>

A Serializer can use an EmbeddedSerializer component to call itself recursively,until all levels of nesting are exhausted.

Page 212: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

201

Basic Properties

serializerThe name of the secondary serializer (choose from the list). The serializer mustbe defined at the global level of the IntelliScript.

schema_connectionsConnects the data holders that are referenced in the secondary serializer to thedata holders that are referenced in the main serializer. The property contains alist of Connect subcomponents, which specify the correspondence (see theConnect component in Chapter 7, Anchors.).

If all the data holders in the main and secondary serializers are identical, youcan omit this property. If there are any differences between the data holders,you must connect the data holders explicitly (even the ones that are identical).

In the recursive example described above, Person should be connected toPerson/Person. This instructs the secondary instance of the serializer toprocess a nested level of the input. It is sufficient to connect just the parentelement (Person), and not the nested elements (Person/*s/Name,Person/*s/BirthDate, etc.), provided that the two Person elements have thesame XSD type.

Advanced PropertiesFor explanations of the following properties, see Standard Serializer Properties :

name

remark

disabled

optional

Online SampleFor a sample, where an EmbeddedSerializer calls a serializer recursively, seeDefining a Serializer in the book Getting Started with ContentMaster .

GroupSerializer

The GroupSerializer serialization anchor binds its nested serialization anchorstogether. You can set properties of the GroupSerializer, which affect the membersof the group.

Advanced Properties

The following properties are useful in situations where the serialization anchormust select specific occurrences of data holders. For an explanation, see Chapter12, Locators, Keys, and Indexing.

Page 213: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

202

source

target

For explanations of the following properties, see Standard Serializer Properties :

name

remark

disabled

optional

RepeatingGroupSerializer

This serialization anchor writes a repetitive structure to the output document.

The anchor is useful if the XML data contains a multiple-occurrence data holder(see Multiple-Occurrence Data Holders in Chapter 6, Data Holders). TheRepeatingGroupSerializer iterates over the occurrences of the data holder andoutputs the data.

Within the RepeatingGroupSerializer, you should nest serialization anchors thatprocess and output each occurrence of the data holder. Optionally, you can definea separator, which the RepeatingGroupSerializer writes to the output between theiterations.

ExampleThe XML input contains the following structure:

<Persons><Person>

<Name>John</Name><Age>35</Age>

</Person><Person>

<Name>Larissa</Name><Age>42</Age>

</Person>...

</Persons>

A RepeatingGroupSerializer, using a newline character as a separator, can outputthis data to:

John 35Larissa 42

You can iterate over several multiple-occurrence data holders in parallel, forexample, you can iterate over a list of men and a list of women, and output a list ofmarried couples. To do this, within the repeating group, add a ContentSerializerfor each data holder.

Page 214: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

203

Basic Properties

separator_positionPosition of the separator relative to the iterations.

The following table explains the values. The examples assume that theseparator is a vertical-line character ( | ).

separator_position Explanation Example

before Write a separator before each iteration (includingthe first iteration).

|1|2|3

after Write a separator after each iteration (includingthe last iteration).

1|2|3|

between Write a separator between the successiveiterations (not before the first iteration and notafter the last iteration).

1|2|3

around Write separators before and after each iteration(including the first and last iterations).

|1|2|3|

separatorA serialization anchor that outputs the separator (typically aStringSerializer). Leave this property empty if you do not want to output aseparator.

Advanced Propertiescount

The number of iterations to run. You may enter a number, or click the browsebutton and select a data holder that contains the number. If blank, theiterations continue until the input is exhausted.

current_iterationA data holder, where the RepeatingGroupSerializer should output thenumber of the current iteration (select from a Schema view). You can use aContentSerializer to write the number to the output.

The following properties are useful in situations where the serialization anchormust select specific occurrences of data holders. For an explanation, seeChapter 12, Locators, Keys, and Indexing.

source

target

Page 215: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 10. Serializers

204

For explanations of the following properties, see Standard Serializer Properties .

name

remark

disabled

optional

StringSerializer

This serialization anchor writes a predefined string to the output document.

Basic Properties

strThe string to write.

Advanced PropertiesFor explanations of the following properties, see Standard Serializer Properties :

name

remark

disabled

Page 216: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 11. Mappers

205

Mappers

Mappers are components that convert an XML source document to another XMLstructure or schema.

A mapper processes the XML input like a serializer, and generates the XML outputlike a parser. Because both the input and the output are fully structured XML, theconfiguration is very easy.

The principles of mapper operation are similar to those of a serializer. Before youread this chapter, we recommend that you read Chapter 10, Serializers, where theprinciples are discussed in depth.

Within a mapper, you can nest mapping anchors and actions. Mapping anchors areanalogous to anchors (which are used in parsers) and to serialization anchors(which are used in serializers).

This chapter explains how to configure the mapper and mapping anchorcomponents.

Creating a Mapper

To create a mapper, you should edit the IntelliScript. The general procedure is asfollows:

1. Add XSD schemas to the project. You need a schema for the input XML andfor the output XML.

It is permitted to use either the same schema or different schemas for the inputand the output.

2. At the global (top) level of the IntelliScript, add a Mapper component.

3. Assign the source and target properties of the Mapper, and edit the otherproperties as required (see the Mapper Component Reference below).

4. Nested within the Mapper, add a sequence of actions and mapping anchors (seeComponents Nested within a Mapper).

5. Test the mapper (see Running a Mapper below), and modify the IntelliScript ifrequired.

11

Page 217: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 11. Mappers

206

Creating a Mapper within a RunMapper Action

In addition to defining a mapper at the global level, it is possible to define amapper within a RunMapper action. For details, see Chapter 9, Actions.

Components Nested within a Mapper

Within a Mapper, you should nest the following components:

Any number of Map actions, which retrieve a data holder from the output andwrite the content to the output.

Optionally, any number of mapping anchors (see the Mapping AnchorComponent Reference below).

The Map actions and the mapping anchors can be in any sequence. You can alsoinsert any other desired actions in the sequence.

Notice that the work of the mapper—writing to the output XML—is done by Mapactions rather than by mapping anchors. This may seem a little different fromparsers and serializers, where the output is created by anchors and serializationanchors, respectively. Actually, this is just a terminology issue. The Map actioncould have been defined as a mapping anchor. It is defined as an action because itis useful in other circumstances, which are not related to mappers.

Mapper Example

To illustrate the mapper configuration, we present a simple example.

Source XMLThe input of the mapper is an XML document, which contains a list of personalnames and their associated ID numbers.

<Persons><Person ID="10">Bob</Person><Person ID="17">Larissa</Person><Person ID="13">Marie</Person>

</Persons>

Page 218: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 11. Mappers

207

Output XML

The desired output of the mapper is an XML list of the names and ID numbers,with no association between them.

<SummaryData><Names>

<Name>Bob</Name><Name>Larissa</Name><Name>Marie</Name>

</Names><IDs>

<ID>10</ID><ID>17</ID><ID>13</ID>

</IDs></SummaryData>

Mapper ConfigurationThe following mapper configuration performs the desired data transformation:

The RepeatingGroupMapping iterates over the Person elements of the input. It usesMap actions to write the data to the Name and ID elements of the output.

Page 219: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 11. Mappers

208

Running a Mapper

To run a mapper in ContentMaster Studio:

1. Set the mapper as the startup component.

2. On the menu, choose Run > Run.

3. You are prompted to open the input XML file.

4. When the execution is complete, ContentMaster Studio displays the Eventsview. Examine the events for any failures or warnings.

5. To view the mapping results, open the output.xml file, which is in the Resultsfolder of the project.

Standard Mapper Properties

In this section, we review certain properties that are found in the Mappercomponent and in many mapping anchors. For additional properties that arespecific to particular components, see the Mapper Component Reference and theMapping Anchor Component Reference.

nameA name that you assign to the component. ContentMaster includes the namein the event log. This can help you find an event that was caused by theparticular component.

remarkA comment describing the component.

disabledIf selected, ContentMaster ignores the component. This is useful for testingand debugging, or for making minor modifications in a project withoutdeleting the existing components.

optionalBy default, if a component fails, the parent component (such as a Mapper inwhich a mapping anchor is nested) fails. If you select the optional property,the parent component does not fail.

Mapper Quick Reference

The main mapping component is:

MapperThe main component of a mapper, which converts XML to XML.

Within a Mapper component, you can nest the following mapping anchors:

Page 220: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 11. Mappers

209

AlternativeMappingsDefines a set of nested mappings, one of which is valid for the current XMLcontext.

EmbeddedMapperActivates a secondary mapper.

GroupMappingBinds its nested mapping anchors and actions together.

RepeatingGroupMappingMaps repetitive XML structures.

Mapper Component Reference

The runnable Mapper component is documented in this section. For mappinganchors, see the Mapping Anchor Component Reference.

Mapper

A Mapper performs XML to XML transformations. It converts a source XMLdocument to an output document, which has a different XML structure.

You must use the source and target properties to identify the root elements of theXML documents. For example, suppose that the document element of the sourceXML is Persons , and the document element of the output is SummaryData. Youshould set the source and target as follows:

Basic Propertiessource

Under this property, insert a Locator component, and select the root of thesource XML from a Schema view (for information about other options of thisproperty, see Chapter 12, Locators, Keys, and Indexing).

Page 221: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 11. Mappers

210

targetUnder this property, insert a Locator component, and select the root of theoutput XML from a Schema view (for information about other options of thisproperty, see Chapter 12, Locators, Keys, and Indexing).

Advanced Properties

root_tagThe name of a root XML element, which is not in the XSD schema of the input.

For example, if the top-level element of the schema is Person, but the XMLinput nests Person in an element called ContentMaster, enter root_tag =ContentMaster.

For explanations of the following properties, see Standard Mapper Properties:

name

remark

Mapping Anchor Component Reference

This section describes the mapping anchors, which you can use in a Mapper.

AlternativeMappings

This mapping anchor lets you define a set of alternative, nested mapping anchors.You can define a criterion for which alternative the parser should accept. Only theaccepted alternative affects the serializer output. The other mapping anchors(whether failed or successful) have no effect on the mapper output.

Example

The input XML may contain a Product element or a Service element, but not both.You wish to process whichever element is in the input.

In an AlternativeMappings serialization anchor, and set its selector property toScriptOrder.

Within the AlternativeMappings, nest two ContentSerializer serializationanchors. Configure one of them to process the Product element, and the other toprocess Service.

Page 222: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 11. Mappers

211

Basic Properties

selectorThe criterion for deciding which alternative to accept. The options are:

selector property Explanation

ScriptOrder ContentMaster tests the nested mapping anchors in the sequencethat they are defined in the IntelliScript. It accepts the first one thatsucceeds.If all the nested mapping anchors fail, theAlternativeMappings component fails.

NameSwitch ContentMaster searches for the nested mapping anchor whosename property is specified in a data holder (select from a Schemaview). It ignores the other nested mapping anchors.If the named mapping anchor fails, the AlternativeMappingscomponent fails.

Advanced PropertiesFor explanations of the following properties, see Standard Mapper Properties:

name

remark

disabled

optional

EmbeddedMapper

This mapping anchor activates a secondary Mapper, which stores its output in thesame output document.

Example

The XML input is a family tree. The input contains Person elements, which arerecursively nested as shown:

<Person> <!-- Parent -->...<Person> <!-- Child -->

...<Person> <!-- Grandchild -->

...</Person>

</Person></Person>

Page 223: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 11. Mappers

212

A Mapper can use an EmbeddedMapper component to call itself recursively, until alllevels of nesting are exhausted.

Basic Properties

mapperThe name of the secondary mapper (choose from the list). The mapper must bedefined at the global level of the IntelliScript.

schema_connectionsConnects the data holders that are referenced in the secondary mapper to thedata holders that are referenced in the main mapper. The property contains alist of Connect subcomponents, which specify the correspondence (see theConnect component in Chapter 7, Anchors.).

If all the data holders in the main and secondary mappers are identical, youcan omit this property. If there are any differences between the data holders,you must connect the data holders explicitly (even the ones that are identical).

In the recursive example described above, Person should be connected toPerson/Person. This instructs the secondary instance of the mapper to processa nested level of the input. It is sufficient to connect just the parent element(Person), and not the nested elements (Person/*s/Name, Person/*s/BirthDate,etc.), provided that the two Person elements have the same XSD type.

Advanced PropertiesFor explanations of the following properties, see Standard Mapper Properties:

name

remark

disabled

optional

GroupMapping

The GroupMapping mapping anchor binds its nested mapping anchors and actionstogether. You can set properties of the GroupMapping, which affect the members ofthe group.

Advanced PropertiesThe following properties are useful in situations where the mapping anchor mustselect specific occurrences of data holders. For an explanation, see Chapter 12,Locators, Keys, and Indexing.

source

target

Page 224: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 11. Mappers

213

For explanations of the following properties, see Standard Mapper Properties:

name

remark

disabled

optional

RepeatingGroupMapping

This mapping anchor processes a repetitive structure in the input or output.

The anchor is useful if the XML input and/or output contains a multiple-occurrence data holder (see Multiple-Occurrence Data Holders in Chapter 6, DataHolders). The RepeatingGroupMapping iterates over occurrences of the data holders.

Within the RepeatingGroupMapping, you should nest mapping anchors and actionsthat process each occurrence of the data holder.

Example

For an example of a RepeatingGroupMapping, see the Mapper Example above.

Advanced Properties

countThe number of iterations to run. You may enter a number, or click the browsebutton and select a data holder that contains the number. If blank, theiterations continue until the input is exhausted.

current_iterationA data holder, where the RepeatingGroupMapping should output the number ofthe current iteration (select from a Schema view).

Advanced PropertiesThe following properties are useful in situations where the mapping anchor mustselect specific occurrences of data holders. For an explanation, see Chapter 12,Locators, Keys, and Indexing.

source

target

For explanations of the following properties, see Standard Mapper Properties:

name

remark

disabled

optional

Page 225: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

214

Locators, Keys, and Indexing

In designing a data transformation, a frequent issue is how to locate the dataholders that you wish to process. If the same data holders can occur multiple timesin an XML structure, there can be ambiguities in identifying the occurrences. Thischapter explains how to use the Locator and Key components to resolve theambiguities.

The components described in this chapter let you identify the occurrences ofmultiple-occurrence data holders in three ways:

Sequentially. Each iteration of a component processes the next occurrence of thedata holder.

By occurrence number. For example, a component can select the thirdoccurrence of a data holder.

By a key, such as an attribute or a nested element, which uniquely identifiesthe occurrence of the data holder.

The sequential approach is the default. However, it is subject to some complexities,which you can control by using the Locator component.

The occurrence-number and key approaches are collectively known as indexing.The term is analogous to the index of a book, where you use a page number or asubject key to identify the location of information. You can implement the indexingby using components called LocatorByOccurrence, LocatorByKey, and Key.

You can use the locator and key components in parsers, serializers, or mappers.You can use the components to identify the occurrences of data holders in theinput, the output, or both.

The locator components are nested in the source and target properties of variousother components, such parsers, serializers, and mappers. We have referred tothese properties in previous chapters, but without a proper explanation. Themeaning and usage of the source and target properties is explained here.

12

Page 226: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

215

Example of Locators

To understand the issues involved in identifying data holders, consider thefollowing example. The example illustrates the use of:

The target property

The Locator component

We will explain the broad outline of the example here. In the following sections ofthe chapter, we will go back and explain how the target and the Locator work indetail.

Input and OutputSuppose that the output schema of a parser supports the following structure:

<Report><Company>

<Employee>John</Employee><Employee>Leslie</Employee><Employee>Pedro</Employee>

</Company><Company>

<Employee>Marie</Employee><Employee>Larry</Employee><Employee>Frances</Employee>

</Company></Report>

The source document, which the parser should process, is a list that contains asingle employee per company (let's say, the CEO of each company):

JohnMarie

The output of the parser should be:

<Report><Company>

<Employee>John</Employee></Company><Company>

<Employee>Marie</Employee></Company>

</Report>

Incorrect Solution

Suppose that you use the following RepeatingGroup to parse the source document:

Page 227: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

216

The output is incorrect:

The problem is that both Company and Employee are multiple-occurrence elements.The RepeatingGroup creates multiple Employee elements correctly, but it doesn'tknow that each Employee element should be nested in a separate Company element.

Correct SolutionTo resolve the ambiguity, you can assign the target property of theRepeatingGroup.

Page 228: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

217

The target identifies the data holder that the RepeatingGroup should create. Thetarget contains a Locator component, which points to the Company element. Thismeans that in each iteration of the RepeatingGroup should create a new occurrenceof the Company element.

If you configure the RepeatingGroup in this way, the output is correct:

Example of Indexing by Key

To further introduce the data-holder identification issues, we present an exampleof indexing by key.

The example is a mapper, which uses indexing to identify the occurrences of dataholders in both its input and its output. On the input side, the indexing is used tomatch the corresponding data from different parts of an XML structure. On theoutput side, the indexing is used to find the correct location of an element in anXML structure.

The example illustrates the use of:

The source and target properties

The Locator, Key, and LocatorByKey components

In the following sections of the chapter, we will explain the detailed operation ofthese properties and components.

Source XML

The input XML is a report listing the names of parents and their children.

For each parent, the XML lists a first name, a last name, and an ID.

For each child, the XML lists a first name, a hobby, and the ID of the parent.

<Report><Parents>

<Parent id="1" firstName="John" lastName="Smith"/><Parent id="2" firstName="Jane" lastName="Doe"/>

</Parents>

Page 229: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

218

<Children><Child name="Eric" hobby="Swimming" parentID="1"/><Child name="Elizabeth" hobby="Biking" parentID="2"/><Child name="Mary" hobby="Painting" parentID="1"/><Child name="Edward" hobby="Swimming" parentID="2"/>

</Children></Report>

Output XML

The desired output is a list of hobbies, and the children who engage in each hobby.

<Hobbies><Hobby name="Swimming">

<Person firstName="Eric" lastName="Smith"/><Person firstName="Edward" lastName="Doe"/>

</Hobby><Hobby name="Biking">

<Person firstName="Elizabeth" lastName="Doe"/></Hobby><Hobby name="Painting">

<Person firstName="Mary" lastName="Smith"/></Hobby>

</Hobbies>

Outline of the Data Transformation Approach

The data transformation is configured according to the following approach:

1. The input data is stored in the Child and Parent elements of the source XML.The corresponding Child and Parent elements are identified as follows:

id attribute of Parent = parentID attribute of Child

2. The data transformation creates Hobby and Person elements, where it stores itsoutput. Each Person is nested in the corresponding Hobby, as follows:

name attribute of Hobby = hobby attribute of Child

3. Write the child's first name into the Person element.

4. Write the parent's last name into the Person element.

Mapper Configuration

The IntelliScript uses Key components to define identifiers for the data holders:

The first Key specifies that the id attribute is a unique identifier of a Parentelement.

The second Key specifies that the name attribute is a unique identifier of a Hobbyelement.

Page 230: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

219

The IntelliScript then defines a Mapper, which has the following configuration:

Page 231: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

220

The components of the Mapper configuration are described as follows. We havenumbered the components as in the Outline of the Data Transformation Approachabove.

1. The source property of the RepeatingGroupMapping specifies that each iterationshould obtain its input from two data holders:

- From an occurrence of the Child element- From the corresponding occurrence of the Parent element

2. The target property of the RepeatingGroupMapping specifies that each iterationshould store its output in two data holders:

- In an occurrence of the Person element- In the corresponding occurrence of the Hobby element

3. The first Map action copies the name attribute of the Child to the firstNameattribute of the Person.

4. The second Map action copies the lastName attribute of the Parent into thelastName attribute of the Person.

Use of IndexingThe example uses indexing by key to identify the occurrences of the Parent andHobby data holders.

In the source property of the RepeatingGroupMapping, the indexing identifiesthe occurrence of Parent that corresponds to a Child.

In the target property, the indexing identifies the occurrence of Hobby where aPerson should be nested.

Source and Target Properties

The source and target properties exist in components such as the following:

In parsers:

ParserGroupRepeatingGroupEnclosedGroupFindReplaceAnchor

In serializers:

SerializerGroupSerializerRepeatingGroupSerializer

Page 232: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

221

In mappers:

MapperGroupMappingRepeatingGroupMapping

In all these categories, the meaning and usage of the properties is identical:

The sourceproperty identifies existing data holders that a data transformationshould use.

The target property identifies data holders that may or may not already exist.If they exist, the data transformation uses them. If they do not exist, the datatransformation creates them.

After you define the source and/or the target, the subsequent components use theidentified data holders. For example, if you define the target of a Group, theanchors nested within the Group use the data holders that the target identifies.

In the following sections, the meaning and usage are explained in detail.

There are properties called source and target in some other components, such as Map (seeChapter 9, Actions). These properties have a different meaning and usage from the above.For an explanation, please see the components where the properties are used.

Source Property

The source property identifies existing occurrences of data holders. The value ofthe source property is a list of the following components:

source Explanation

Locator Identifies a single-occurrence or multiple-occurrence data holder.In the latter case, each iteration accesses the next occurrence, insequence.

LocatorByKey Identifies an occurrence of a multiple-occurrence data holder by akey.

LocatorByOccurrence Identifies an occurrence of a multiple-occurrence data holder bynumber.

Page 233: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

222

Default Behavior

If you do not assign the source property of a component, the component identifiesdata holders in the following way:

If there is only one occurrence of the data holder, ContentMaster uses theexisting occurrence.

If there are multiple occurrences of the data holder, the behavior is as follows:

- In an iterative context (for example, within a RepeatingGroupSerializer),each iteration accesses the next occurrence of the data holder, in sequence.

- In a non-iterative context (for example, a GroupSerializer that is not nestedwithin an iterative component), the component accesses the first occurrenceof the data holder.

Ambiguities in the Default Behavior

There can be some ambiguities in the default behavior. Ambiguities can arise, forexample, in the following circumstances.

In cases where a multiple-occurrence element is nested within anothermultiple-occurrence element. This is illustrated in Example 1 below.

In cases where the XSD schema permits alternative data holders (defined withxs:choice).

In cases where the XSD schema permits a data holder to be missing (definedwith minOccurs = 0).

In such cases, it is prudent to assign the source property explicitly.

Data Holder Must Exist

The source property identifies a data holder that already exists in the scope of thedata transformation. If the data holder does not exist, the component containingthe source property fails.

For example, suppose that the source property of a Group contains a non-optionalLocatorByOccurrence, which points to the third occurrence of a data holder. If onlytwo occurrences exist, the Group fails.

Using the Source Property for Input or OutputTypically, a component uses the source property to identify where it should obtaininput. For example, a GroupSerializer can use the property to identify anoccurrence that it should serialize.

It is also possible to use the property to identify where the component should storeoutput. For example, suppose that a parser has already created 10 occurrences ofan XML element. After the occurrences have been created, a Group anchor assignsan attribute in one occurrence of the element. The Group can use the sourceproperty to identify the occurrence.

Page 234: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

223

Example 1: Nested Multiple-Occurrence Data Holders

Suppose that the input schema of a serializer supports the following structure:

<Report><Company>

<Employee>John</Employee><Employee>Leslie</Employee><Employee>Pedro</Employee>

</Company><Company>

<Employee>Marie</Employee><Employee>Larry</Employee><Employee>Frances</Employee>

</Company></Report>

You want to iterate over all the Employee elements, and produce the followingoutput:

JohnLesliePedroMarieLarryFrances

At first thought, you might create a RepeatingGroupSerializer, and configure it tooutput the Employee data holder:

This does not work correctly! By default, each iteration selects a new instance ofEmployee within the same Company. The result is the output:

JohnLesliePedro

In other words, the RepeatingGroupSerializer accesses only the first Company.

Page 235: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

224

You can solve the problem by nesting the RepeatingGroupSerializer insideanother RepeatingGroupSerializer. To resolve any potential ambiguities, you canconfigure the source properties explicitly:

Each iteration of the outer RepeatingGroupSerializer processes a differentoccurrence of Company. Each iteration of the nested RepeatingGroupSerializerprocesses a different occurrence of Employee. The result is the desired output.

Alternatively, suppose you want to iterate only over the second Employee elementin each Company. The desired output is:

LeslieLarry

You can do this by configuring a single RepeatingGroupSerializer , whose sourceis Company. This causes each iteration to access the next instance of Company. Withinthe iteration, you can configure a GroupSerializer , whose source property uses aLocatorByOccurrence to select the second Employee . This generates the desiredoutput.

Page 236: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

225

Example 2: Indexing

The Example of Indexing by Key, at the beginning of this chapter, illustrates how touse the source property with indexing. The source property of theRepeatingGroupMapping is configured as follows:

The source property identifies two data holders:

It uses a Locator component to identify an occurrence of Child. Each iterationprocesses the next occurrence of Child, sequentially.

It uses a LocatorByKey component to identify an occurrence of Parent. Thiscauses each iteration to process the occurrence of Parent that corresponds tothe occurrence of Child.

Page 237: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

226

Target Property

The target property identifies an occurrence of a data holder, which may or maynot already exist. If the occurrence exists, the component uses it. If the occurrencedoes not exist, the component creates it.

The value of the target property is a list of the following components:

target Explanation

Locator Identifies a single-occurrence or multiple-occurrence data holder.In the latter case, each iteration creates a new occurrence.

LocatorByKey Identifies an occurrence of a multiple-occurrence data holder by anindexing key. If the occurrence does not yet exist, it is created.

Default BehaviorIf you do not assign the target property of a component, the component identifiesdata holders in the following way:

If the schema permits only a single occurrence of the data holder,ContentMaster accesses or creates the occurrence.

If the data holder can have multiple occurrences, the behavior is as follows:

- In an iterative context (for example, within a RepeatingGroup), each iterationcreates a new occurrence of the data holder.

- In a non-iterative context (for example, a Group that is not nested within aniterative component), the component creates one new occurrence of the dataholder.

Ambiguities in the Default Behavior

There can be some ambiguities in the default behavior. Ambiguities can arise, forexample, in the following circumstances.

In cases where a multiple-occurrence element is nested within anothermultiple-occurrence element. This is illustrated in Example 1 below.

In cases where the XSD schema permits alternative data holders (defined withxs:choice).

In cases where the XSD schema permits a data holder to be missing (definedwith minOccurs = 0).

In such cases, it is prudent to assign the target property explicitly.

Data Holder Can Be Created

The target property identifies a data holder that may or may not already exist inthe scope of the data transformation. If the data holder does not exist, it is created.

Page 238: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

227

For example, suppose that the target property of a Group contains a LocatorByKey,which points to a particular occurrence of a data holder. If the occurrence alreadyexists, the Group uses it. If the occurrence does not exist, the Group creates it.

Using the Target Property for Input or Output

Typically, a component uses the target property to identify where it should storeoutput. For example, a Group can use the property to identify an occurrence whereit should store data.

It is also possible to use the property to identify where a component should obtaininput. For example, suppose that a GroupSerializer contains an action, whichcomputes data and stores it in a variable. The GroupSerializer then contains aContentSerializer, which writes the variable to the output. You can use thetarget property to create the occurrence of the variable, which theGroupSerializer should use. The variable then serves as the input of theContentSerializer.

Example 1: Nested Multiple-Occurrence Data Holders

The Example of Locators, at the start of this chapter, illustrates how to use the targetproperty to differentiate between parent and child multiple-occurrence dataholders. The example is the exact inverse of the serializer example, which ispresented in Example 1 of the Source Property above.

Example 2: Indexing

The Example of Indexing by Key, above in this chapter, illustrates how to use thetarget property with indexing. The target property of the RepeatingGroupMappingis configured as follows:

The target property identifies two data holders:

It uses a Locator component to identify an occurrence of Person. Each iterationcreates a new occurrence of Person.

It uses a LocatorByKey component to identify the occurrence of the Hobbyelement, where the occurrence of Person should be nested. If the Hobbyelement already exists, the data transformation uses it. If the Hobby elementdoes not yet exist, the data transformation creates it.

Page 239: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

228

Standard Locator and Key Properties

In this section, we review certain properties that are found in the locator and keycomponents. For additional properties that are specific to particular components,see the Locator and Key Component Reference.

disabledIf selected, ContentMaster ignores the component. This is useful for testingand debugging, or for making minor modifications in a project withoutdeleting the existing components.

optionalBy default, if a component fails, its parent component fails. If you select theoptional property, the parent component does not fail.

remarkA comment describing the component.

Locator and Key Component Quick Reference

The locator and key components are:

KeyDefines a unique identifier for a data holder.

LocatorIdentifies a single-occurrence or multiple-occurrence data holder.

LocatorByKeyIdentifies an occurrence of a multiple-occurrence data holder by using a key.

LocatorByOccurrenceIdentifies an occurrence of a multiple-occurrence data holder by number.

Locator and Key Component Reference

This section documents the locator and key components that are available inContentMaster.

Page 240: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

229

Key

A Key component defines one or more attributes or elements, which together serveas a unique identifier of their parent element.

How to Define

You can define a Key only at the global level of the IntelliScript. This lets youreference the Key anywhere in the project.

Example

The Example of Indexing by Key (above in this chapter) defines a key for the Hobbyelement in the following structure:

<Hobbies><Hobby name="Swimming">

<Person firstName="Eric" lastName="Smith"/><Person firstName="Edward" lastName="Doe"/>

</Hobby><Hobby name="Biking">

<Person firstName="Elizabeth" lastName="Doe"/></Hobby><Hobby name="Painting">

<Person firstName="Mary" lastName="Smith"/></Hobby>

</Hobbies>

The key is the name attribute, which uniquely identifies each Hobby.

Composite Keys

Optionally, you can define a list of data holders as a composite key. To do this, nestmultiple data holders under the unique_fields property.

Consider the following example:

<Persons><Person ID="17" SubID="A">Bob</Person><Person ID="17" SubID="B">Jane</Person><Person ID="35" SubID="A">Larry</Person>

</Persons>

Page 241: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

230

Neither the ID attribute nor the SubID attribute identifies a Person elementuniquely. The combination of ID and SubID, however, is a unique identifier. Youcan define ID and SubID as a composite key.

Restrictions on the KeyThe unique_fields must be nested within the recurring_element. They can beattributes of the element, they can be nested elements at any level of nesting, orthey can be attributes of the nested elements.

This means, for example, that Persons/Person/SocialSecurity/@Number can be avalid key for Persons/Person, because @Number is nested within Persons/Person.On the other hand, Persons/Child is not a valid key for Persons/Person because itis not correctly nested.

The unique_fields must identify the closest ancestor that can have multipleoccurrences. For example, if both Parent and Child are multiple-occurrenceelements, then Parent/Child/@name can be a valid key for Parent/Child but not forParent.

The unique_fields must have simple data types. They cannot be structures.

Sibling and Non-Sibling OccurrencesA key uniquely identifies sibling occurrences of an element. It is permitted for non-sibling occurrences to have the same key.

For example, consider the following XML structure:

<Report><Company>

<Employee ID="1">John</Employee><Employee ID="2">Leslie</Employee>

</Company><Company>

<Employee ID="1">Marie</Employee><Employee ID="2">Larry</Employee>

</Company></Report>

The ID attribute can be a valid key for Employee because it uniquely identifies anEmployee within a single Company . The duplication of ID values in different Companyelements does not invalidate the key.

Keys of Reusable ElementsYou can define a key on a reusable element, which is defined in the XSD schema.

For example, suppose that Persons/Person can occur in several different contextswithin the XML. If you define ID as a key for Persons/Person, the key is valid inany context where Persons/Person is used.

Page 242: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

231

Enforced Uniqueness of a Key

ContentMaster enforces the uniqueness of a Key. This has the followingconsequences:

If two or more sibling occurrences of an input element have the same keyvalues, ContentMaster considers each occurrence to overwrite the previousoccurrences. It uses only the last occurrence that it encounters.

If an occurrence of an input element is missing a key value, the occurrence isignored.

If ContentMaster outputs a keyed element, and a sibling element having thesame key value already exists, the existing occurrence is overwritten.

In all these cases, ContentMaster generates a warning in the event log.

Display in the Schema View

The Schema view displays a key in the following way:

The symbol

means that the name attribute has been defined as one of the unique_fields. Thesymbol

is called an XPath predicate. It is an XPath representation of the complete keydefinition.

Basic Properties

recurring_elementA multiple-occurrence element, whose occurrences are identified by the key(select from a Schema view).

unique_fieldsThe key (select one or more data holders from a Schema view).

Page 243: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

232

Advanced Properties

For explanations of the following properties, see Standard Locator and Key Properties:

disabled

remark

Locator

This component is used in the source and target properties to identify a dataholder.

You can use it to identify either a single-occurrence or multiple-occurrence dataholder. In the latter case, each iteration of the component that uses the Locatorprocesses the next occurrence of the data holder.

ExampleFor examples of Locator, see the Example of Locators and the Example of Indexing byKey above.

Basic Propertiesdata_holder

The data holder that the component identifies (select from a Schema view).

Advanced PropertiesFor explanations of the following properties, see Standard Locator and Key Properties:

disabled

optional

remark

LocatorByKey

This component is used in the source and target properties to identify anoccurrence of a multiple-occurrence data holder.

Before you use this component, you must define a Key at the global level of theIntelliScript. The Key specifies the data holder(s) that uniquely identify theoccurrence.

In the LocatorByKey configuration, you must specify:

The key that you wish to use.

The values of the key fields. You can specify the values either statically (bytyping a value) or dynamically (by selecting a data holder that contains thevalue).

Page 244: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

233

Example

The Example of Indexing by Key at the beginning of this chapter illustrates how touse LocatorByKey.

Conflicts Between Locators

In case of conflicts, a nested LocatorByKey overrides a parent locator.

For example, suppose that the target property of a Group contains a LocatorByKey,which points to the third occurrence of an element. A nested Group contains aLocatorByKey, which points to the fifth occurrence. The nested Group uses the fifthoccurrence.

Basic Properties

keyFrom a Schema view, select the XPath predicate representation of the key. Forexample, if you have defined Hobbies/Hobby/@name as a Key, then you canselect Hobbies/Hobby[@name=$1].

paramsUnder this property, specify the values of the $1 , $2, etc. parameters in theXPath predicate.

Type each value, or click the Browse button and select a data holder thatcontains the value.

Advanced Properties

For explanations of the following properties, see Standard Locator and Key Properties:

disabled

optional

remark

LocatorByOccurrence

This component is used in the source property to identify an occurrence of amultiple-occurrence data holder.

The component identifies the occurrence by number. For example, if there are tenoccurrences of a data holder, you can use LocatorByOccurrence to process the thirdoccurrence.

You can specify the occurrence number either statically (by entering a number) ordynamically (by selecting a data holder that contains the number).

ExampleFor an example of LocatorByOccurrence, Example 1 of the Source Property, above.

Page 245: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 12. Locators, Keys, and Indexing

234

Conflicts Between Locators

In case of conflicts, a nested LocatorByOccurrence overrides a parent locator.

For example, suppose that the target property of a Group contains aLocatorByOccurrence, which points to the third occurrence of an element. A nestedGroup contains a LocatorByOccurrence, which points to the fifth occurrence. Thenested Group uses the fifth occurrence.

Basic Propertiesrecurring_element

The data holder that the component identifies (select from a Schema view).

occurrence_numberThe number of the occurrence. Type a number, or click the Browse button andselect a data holder that contains the number.

Advanced Properties

For explanations of the following properties, see Standard Locator and Key Properties:

disabled

optional

remark

Page 246: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 13. Project Properties

235

Project Properties

The project properties are options that you can set for the behavior of a project. Theycontrol essential features of the project such as the input and output encodings, theauthentication support, and the XML validation.

The project properties are saved with the project. They affect the behavior in allcircumstances where you run the project:

In the ContentMaster Studio environment

When you deploy the project as a ContentMaster service and run it inContentMaster Engine.

For many projects, you can accept the default values of the project properties.Nevertheless, before you deploy a project as a ContentMaster service, you shouldalways review the project properties and confirm that the settings meet yourneeds.

Properties versus Preferences

Do not confuse the project properties with the ContentMaster Studio preferences:

The preferences affect the display of data transformations in ContentMasterStudio. They apply to all projects equally.

The project properties affect the operation of a data transformation both inContentMaster Studio and in ContentMaster Engine. You can set theproperties independently for each project.

Setting the Project Properties

To set the properties of a project:

1. Open a TGP script file belonging to the project in an IntelliScript editor.

2. On the menu, choose Project > Properties.

Alternatively:

1. Select the project in the ContentMaster Explorer.

2. On the menu, choose File > Properties.

13

Page 247: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 13. Project Properties

236

Info Page

The Info page of the project properties displays general information, such as thestorage location of the project.

Authentication Page

If the project accesses a location that requires a login, you can store the logininformation on the Authentication page of the project properties. This feature isuseful, for example, if a parser processes source documents that are located on apassword-protected web site.

The options are as follows:

Enable authenticationSelect this option if the remote location requires a login.

Prompt before executionWhen a login is required, ContentMaster prompts the user to enter a username and password.

Save in projectWhen a login is required, the project automatically submits the user name andpassword that are specified in the login information.

Login informationThe user name and password.

Page 248: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 13. Project Properties

237

Encoding Page

The Encoding page of the project properties lets you specify how the input, output,and working files of a project are encoded.

Supported Encodings

ContentMaster supports encodings such as:

Encoding Description

BaseCodePage A proprietary Hebrew code page, for backwardscompatibility with ContentMaster 3.1.0 and earlier versions.

Big5 Chinese

EBCDIC-37 US/Canada

EBCDIC-424 Hebrew

GB2312 Chinese

Page 249: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 13. Project Properties

238

Encoding Description

ISO-8859-1 Latin-1 (English and West European)

ISO-8859-2 Latin-2 (East European)

ISO-8859-3 Latin-3 (South European)

ISO-8859-4 Latin-4 (North European)

ISO-8859-5 Cyrillic

ISO-8859-6 Arabic

ISO-8859-7 Greek

ISO-8859-8 Hebrew

ISO-8859-9 Latin-5 (Turkish)

ISO-8859-15 Latin-9

KSC_5601 Korean

Shift_JIS Japanese

UTF-16 Unicode

UTF-16BE Unicode

UTF-7 Unicode

UTF-8 Unicode

Windows-1250 Central European

Windows-1251 Cyrillic

Windows-1252 ANSI English and West European

Windows-1253 Greek

Windows-1254 Turkish

Windows-1255 Hebrew

Windows-1256 Arabic

Windows-1257 Baltic

Windows-1258 Vietnamese

Additional encodings may be supported. For an up-to-date list, select one of theCustom options on the Encoding page and open the drop-down list. Please contactSAP if you need an encoding that isn't in the list.

LimitationsThe encoding support is subject to the following limitations:

Page 250: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 13. Project Properties

239

The example pane of the IntelliScript editor may fail to display the East Asianencodings properly (Chinese, Japanese, and Korean).

ContentMaster interprets multiple-byte encodings (East Asian and Unicode)as binary byte streams. It is not aware of the encoding semantics, such as thebreaks between characters.

The Unicode encodings, except for UTF-8, are not supported as workingencodings.

If you define the working encoding as East Asian or UTF-8, you should avoiddefining Marker anchors that contain multiple-byte characters. ContentMastermay misinterpret a Marker that contains such characters.

InputThe Input area of the Encoding page specifies how the input of a ContentMasterproject is encoded, for example, the source document that is processed by a parser.

Extract code page from sourceIf selected, ContentMaster uses a code page that is specified in the sourcedocument (for example, in the encoding attribute of an XML document).

If ContentMaster does not find an encoding specification in the document, ituses the encoding defined in the settings described below.

Use working encodingIf selected, ContentMaster assumes that the input has the same encoding asthe working files of the project (defined in the Working area of the propertiespage).

CustomSelect the encoding from the drop-down list.

Encoding schemaThe encoding of special characters: none or XML.

In the XML encoding schema, symbols such as < or > are represented asentities (&lt; and &gt;, etc.).

Note that serializers and mappers ignore this option. A serializer or mapperassumes that its input uses the XML encoding schema.

Byte orderThe byte order of binary data. The options are Little Endian (default,appropriate for most files on the Windows operating system), Big Endian, orno binary conversion.

Working

The Working area of the Encoding page specifies the encoding of the project'sworking files, including the TGP script files and the IntelliScript.

Use ContentMaster default codepageUses the system default encoding.

Page 251: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 13. Project Properties

240

CustomSelect the encoding from the drop-down list. The ISO-8859 and Windowsencodings are supported.

Working encoding schemaThe encoding of special characters: none or XML (as for the input encoding).

You must select a working encoding that is compatible with the encoding of your XSDschema. For details, see Encoding of the XSD Schema in Chapter 6, Data Holders.

OutputThe Output area of the Encoding page defines the encoding of the project output.

Use working encodingUse the same encoding as for the working files.

Same as inputUse the same encoding as for the input.

CustomSelect the encoding from the drop-down list.

Encoding schemaThe encoding of special characters: none or XML (as for the input encoding).

Note that parsers and mappers ignore this option. A parser or mapper encodesits output using the XML encoding schema.

Byte orderThe byte order of binary data. The options are Little Endian (default,appropriate for most files on the Windows operating system), Big Endian, orno binary conversion.

External Tools Builders Page

The External Tools Builders page is a standard Eclipse page, which is not used byContentMaster.

Namespaces Page

The Namespaces page of the project properties is used to configure XMLnamespaces.

You must define the namespaces in the targetNamespace attribute of the XSDschemas. In the project properties, you can edit only the namespace alias.

For more information, see Chapter 6, Data Holders.

Page 252: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 13. Project Properties

241

Output Control Page

The Output Control page of the project properties specifies options for howContentMaster should generate the project output.

XSLT stylesheet URLBrowse to an XSLT stylesheet, which you want to use to display XML outputof the project. ContentMaster adds an XSLT processing instruction to the XMLoutput, for example:

<?xml-stylesheet type="text/xsl"href="C:\stylesheets\MyStylesheet.xsl"?>

When you display the XML in Internet Explorer, the browser applies thestylesheet.

Create event logBy default, ContentMaster Studio generates event logs for the project. You canclick the Advanced button and define which types of events should beincluded in the log.

If you deselect the option to create event logs, ContentMaster Studio does notgenerate an event log. The Events view displays only minimal information,such as the service initialization and termination.

This property has no effect when you run a ContentMaster service inContentMaster Engine. For information about the event logs generated byContentMaster Engine, see the ContentMaster Engine Developer's Guide.

Save parsed documentsSpecifies whether ContentMaster should save a copy of the parsed documentswith the event log. The event log uses the copy to display the source of anevent.

Add binary encoding prefix to output fileAdds a binary byte-order mark at the start of the output file. Some Unicodeapplications use the mark to identify the encoding.

Disable automatic outputBy default, a parser or serializer writes all output that it generates to theresults file. If you select this option, the output is not written unless the parseror serializer runs the DumpValues action. This is useful mainly for debugging.

Disable value compressionsDisables XML output optimizations. The optimizations improve performance,especially when processing large documents. Do not select this option unlessadvised by SAP support.

Page 253: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 13. Project Properties

242

Project References Page

The Project References page is a standard Eclipse properties page, which is notused by ContentMaster.

XML Generation Page

On the XML Generator page of the project properties, you can specify how theproject ensures that its XML output is valid.

Schema location, No namespace schema locationThese options insert schemaLocation and noNamespaceSchemaLocationattributes in the document element of the output XML.

For example, suppose that you specify the following project properties:

Page 254: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 13. Project Properties

243

Schema location = http://www.example.com/NS1No namespace schema location = http://www.example.com/NoNSIf the document element of the output XML is called Doc, ContentMasteroutputs the following code:

<Docxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.example.com/NS1"xsi:noNamespaceSchemaLocation="http://www.example.com/NoNS">

XML output modeThis option specifies what ContentMaster should do if a parser does not mapany data to a data holder (an XML element or attribute) that is defined in theXSD schema.

XML output mode Explanation

Full ContentMaster attempts to add the missing data holders to the XMLoutput. It assigns values to the added data holders as follows:The default value, if the schema defines one.If the data holder has an integer type: 0.If the data holder has a floating type: 0.0.Otherwise, it leaves the data holder empty.

Compact ContentMaster does not add the missing data holders, and it removesempty data holders.A data holder containing the number 0 is not considered empty for thispurpose, and is not removed.

As is The XML output contains the data holders that the parser explicitly set.ContentMaster does not add missing data holders, and it does notremove empty values.

The above options do not cause a parser to produce invalid XML, providedthat you select the validate added option, which is described below. Undercertain conditions, however, the compact or as is options can cause a parser tooutput partial or empty XML. For example, suppose that you choose thecompact mode, and the parser does not create a required element.ContentMaster removes its parent element, in an attempt to create valid XML.If the parent element is also required, the grandparent is removed. Thisprocess continues until ContentMaster reaches an optional element, or untilthe XML is empty.

Add default values for required elements/attributesThese options instruct a parser to output XML elements or attributes that arerequired by the XSD schema. The options override the compact and as is outputmodes.

Page 255: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 13. Project Properties

244

ContentMaster assigns values to the required data holders as in the Fulloutput mode (see the explanation in the above table).

Validate added required XML elements or attributesIf selected, this option causes ContentMaster to validate the elements orattributes that it adds because of the full or add required options. If adding theelement or attribute would invalidate the output XML, ContentMaster doesnot add it.

This option is selected by default, and we recommend that you leave itselected. Deselecting the option may result in invalid XML.

<?xml?>The options under this heading add a processing instruction at the beginningof the output XML, for example:

<?xml version="1.0" encoding="Windows-1252"?>

Select the XML version and the value of the encoding attribute. For theencoding, you can choose the output encoding (as defined on the EncodingPage) or a custom encoding designator.

Add custom processing instructionsThis option adds custom processing instructions to the XML header. Type theprocessing instructions (including the <? ?> symbols).

Add XML root elementThis option lets you wrap the output XML in a tag that is not configured in theIntelliScript, and may not be defined in the XSD schema.

For example, suppose that the output of a parser is:

<Result>1.0</Result>

If you select the option to add an XML root element, and you set the elementname to ContentMaster, the project generates the following output:

<ContentMaster><Result>1.0</Result>

</ContentMaster>

You must use this option if you run a parser on multiple source documents(see Running on Additional Source Documents in Chapter 14, Running and TestingProjects).

Page 256: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 14. Running and Testing Projects

245

Running and Testing Projects

When you develop a ContentMaster data transformation, you should test anddebug it thoroughly before you put it into production. ContentMaster providesseveral tools for testing, debugging, and troubleshooting, such as:

Color-coding a source document in the example pane of the IntelliScript editor

Running the data transformation in ContentMaster Studio and viewing theresults

Viewing the event log that ContentMaster generates for each run

Cross-identifying an anchor in the example source, in the IntelliScript, and inthe Events view

Color-Coding the Example Source

As you construct a parser, ContentMaster studio color-codes the anchors that youhave defined in the example source.

In the learn-example style, the specific anchors that you use to define thedocument structure are color-coded, for example, an anchor that marks arepeating group.

In the mark-example style, all the anchors that ContentMaster finds in thedocument are color coded, for example, all iterations of a marker within arepeating group.

By examining the color-coded text, you can confirm that ContentMaster isidentifying the anchors correctly.

On the IntelliScript menu or on the toolbar, you can choose the followingcommands, which control the color-coding style:

Learn the Example AutomaticallyEnables automatic color coding in the example pane, in the learn-examplestyle. When you define anchors in the IntelliScript, ContentMasterautomatically highlights the corresponding location in the example.

14

Page 257: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 14. Running and Testing Projects

246

Learn ExampleColor-codes the anchors in the example pane in the learn-example style.

You can use this command to activate the color coding, if you have deselectedthe option to Lean the Example Automatically. You can also use this commandto return to the learn-example style, after you have displayed the mark-example style.

Mark ExampleRuns the selected parser, and color-codes the anchors in the example pane inthe mark-example style.

Stop Learning or Marking the ExampleStops the color-coding operation. If the example is very long, you can use thisoption to halt the color display and speed up the response.

For detailed instructions and exercises on how to use the color codes fordebugging, see Getting Started with ContentMaster.

Example pane showing color-coded anchors in the learn-example style.

Example pane showing color-coded anchors in the mark-example style.

Running in ContentMaster Studio

To test a data transformation, you can run it in the ContentMaster Studioenvironment.

1. Set the startup component of the project. This is a parser, serializer, mapper, orglobally defined transformer that the project should activate.

You can do this in any of the following ways:

- In the IntelliScript editor, right-click the component and choose Set asStartup Component, or

Page 258: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 14. Running and Testing Projects

247

- In the Component view, right-click the component and choose Set asStartup Component, or

- On the menu, choose Run > Run.

2. Run the component in one of the following ways:

- In the Run > Run menu command, click the Run button, or- On the menu, choose Run > Run <StartupComponentName>, where

<StartupComponentName> is the name of the startup component.

3. The data transformation is executed in ContentMaster Engine. ContentMasterStudio displays the Events view, which informs you of any problems thatoccurred in the execution (see Viewing the Event Log below).

4. To view the results, double-click the output file, in the Results folder of theContentMaster Explorer.

For example, the output of a parser is an XML file, usually called output.xml .ContentMaster opens the file in a Microsoft Internet Explorer window.

Output file of a parser, displayed in an Internet Explorer window.

If the Output File is not Displayed

Occasionally, a serious error may cause ContentMaster to generate an output filethat cannot be viewed in the default viewing application. To diagnose the problem:

Examine the Events view for a description of the problem (see Viewing theEvent Log below).

Try opening the output file in an external application such as Notepad.

If the output file is not created at all, examine the Output Control page of theproject properties, and confirm that the option to Disable Automatic Output isnot selected (see Chapter 13, Project Properties).

Running on Additional Source Documents

By default, ContentMaster Studio runs a parser on the example source. You shouldtest the parser on other source documents, as well.

Page 259: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 14. Running and Testing Projects

248

To do this, set the sources_to_extract property of the parser. The value is a sourceidentifier such as a file path (optionally including * wildcards), a URL, or a list offiles (see Chapter 3, Parsers).

If you select multiple sources, you must select the option to add XML root element,which is located on the XML Generation page of the project properties (see Chapter13, Project Properties). This causes ContentMaster to nest the results from eachsource in a root element. If you don't do this, the XML that the parser generates isnot well formed because it does not have a unique root.

You can also open a source document in the example pane, by using the IntelliScript > TestDocument command. Then run the IntelliScript > Mark Example command, and confirmthat the parser color-codes the anchors correctly.

Viewing the Event Log

When you run a data transformation in the ContentMaster Studio environment,the Events view displays the events that occur during the execution. You shouldexamine the events for failure or warning messages.

An event log created by ContentMaster Studio is stored, by default, at the locationResults/events.cme in the project folder.

Event-Log Properties

In the project properties, you can configure the events that ContentMaster writes tothe log. For information, see Output Control Page in Chapter 13, Project Properties.

Event Display Preferences

You can customize the event display by using the Window > Preferencescommand. On the ContentMaster page of the preferences, you can configure:

The types of events that ContentMaster Studio displays, such as notifications,warnings, or failures.

Whether the failure events propagate (bubble up) in the events tree.Propagation lets you find the failure events more easily because they arelabeled at the top levels of the tree.

Note that the preferences are independent of the event-log properties. Theproperties control the events that ContentMaster stores in the log. The preferencescontrol how the stored events are displayed.

Page 260: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 14. Running and Testing Projects

249

Event display without propagation

Event display with propagation

Understanding the Event Log

The event log displays the detailed events that occurred during the execution. Forexample, it displays an event for each anchor that a parser found.

To display the events at a particular stage, select the stage in the left pane of theEvents view.

The events are labeled with status icons, which have the following meanings:

InformationA normal operation performed by ContentMaster.

WarningA warning about a possible error. For example, ContentMaster generates awarning event if an operation overwrites the existing content of a data holder.The execution continues.

Page 261: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 14. Running and Testing Projects

250

FailureA component failed. For example, an anchor fails if ContentMaster cannot findit in the source document. The execution continues.

Optional FailureAn optional component (a component configured with the optional property)failed. For example, an optional anchor is missing from the source document.The execution continues.

Fatal errorA serious error occurred, for example, a parser has an illegal configuration.ContentMaster halts the execution.

UnknownThe event status cannot be determined.

Warnings, failures, and optional failures may be perfectly normal under somecircumstances. For example, a RepeatingGroup anchor may display an optionalfailure after its last iteration, because it cannot find any more data to parse. If theevent log displays warnings or failures, you should investigate why they occur,and determine whether they are normal or signal a problem.

Using Named Components

We suggest that you assign the name property of the components in your datatransformations. ContentMaster uses the name to label the events. This can makethe event log and the IntelliScript easier to understand, and it helps you identifythe source of any failures.

IntelliScript with named Marker anchors

Event log with named Marker anchors

Page 262: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 14. Running and Testing Projects

251

Cross-Identifying Events

If you double-click click an event, the anchor that caused the event is highlightedin the IntelliScript and example panes. This can help you identify the cause of afailure.

Effect of Failure Events

If the event log shows that a component has failed, you should be aware of thefollowing consequences.

Failure Causes Parent to FailIf the optional property of the component is not selected, a failure of thecomponent causes its parent to fail. If the parent is also non-optional, its ownparent (the grandparent of the original component) fails, and so forth.

For example, suppose that a Parser contains a Group, and the Group contains aMarker. All the components are non-optional. If the Marker does not exist in thesource document, the Marker fails. This causes the Group to fail, which in turncauses the Parser to fail.

Pictorially, we can represent these relationships in the following way:

Parser //FailedGroup //Failed

Marker //Failed

Optional Failure Does Not Cause Parent to FailIf the optional property of a component is selected, a failure of the componentdoes not bubble up to the parent.

In the above example, suppose that the Group is optional. The failed Marker causesthe Group to fail, but the Parser does not fail.

Parser //SucceededGroup //Failed

Marker //Failed

RollbackIf a component fails, its effects are rolled back.

For example, suppose that a Group contains three non-optional Content anchors,which store values in data holders. If the third Content anchor fails, the Group fails.ContentMaster rolls back the effects of the first two Content anchors. The data thatthe first two Content anchors already stored in data holders is removed.

The rollback applies only to the main effects of a data transformation, such as aparser storing values in data holders, or a serializer writing to its output file. Therollback does not apply to side effects. In the above example, if the Group containsan ODBCAction that performs an INSERT query on database, the record that theaction added to the database is not deleted.

Page 263: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 14. Running and Testing Projects

252

Group //FailedContent //Data holder is rolled backContent //Data holder is rolled backODBCAction //INSERT query is not rolled backContent //Failed

Opening a ContentMaster Engine Event Log

When you deploy a service to ContentMaster Engine (see Chapter 15, DeployingContentMaster Services), you can monitor the Engine event logs for errors orfailures. You can open an event log in ContentMaster Studio by dragging the *.cmefile to the Events view.

For more information, see the Event Logs chapter in the ContentMaster EngineDeveloper's Guide.

Page 264: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 15. Deploying ContentMaster Services

253

Deploying ContentMaster Services

When you finish configuring and testing a project, you should deploy the projectfrom ContentMaster Studio as a ContentMaster service. This lets ContentMasterEngine access and run the project.

There is no relation between ContentMaster services and Windows services. You cannotview or administer ContentMaster services in the Windows Control Panel.

Runnable Components

A service can run a parser, a serializer, a transformer, or a mapper as its top-levelcomponent. Collectively, these are known as runnable components. Of course, therunnable component can call other components of the project.

Startup Component

Before you deploy the service, you must set one of the runnable components as thestartup component. This is the component that ContentMaster starts when it runs theservice. It is the same as the startup component that you select when you test theproject in ContentMaster Studio (see Chapter 14, Running and Testing Projects).

Multiple Runnable ComponentsIf the project contains multiple runnable components, you can deploy it multipletimes under different service names. Before you deploy each service, you can selecta different startup component.

In this way, for example, you can define multiple parsers in the same project, anddeploy services that run the parsers.

Deploying a Service

Deploying a ContentMaster service means making a project available toContentMaster Engine. In practice, the service is deployed by copying the projectto the ContentMaster repository.

15

Page 265: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 15. Deploying ContentMaster Services

254

Setting the ContentMaster Repository Location

The ContentMaster repository is a folder, which stores the deployed projects. Thelocation should be on the same computer as ContentMaster Engine.

Ordinarily, you should set the ContentMaster repository when you installContentMaster. The default location is:

c:\Program Files\SAP\ContentMaster\ServiceDB

If necessary, you can afterwards change the location in the Configuration Editor.For instructions, see the ContentMaster Administrator's Guide.

Preparing a Project for Deployment

Before you deploy a project as a service, you should review the projectconfiguration. You should remove any testing or debugging settings, which maybe inappropriate when you move to the production mode.

For example, you may have used the sources_to_extract property of a parser totest multiple source files quickly. You can delete the property value.

On the XML Generation page of the project properties, you may have selected theoption to Add an XML Root Element. On the Output Control tab, you may haveconfigured the event logging. You should review these settings to be sure they areappropriate.

Of course, if you need any of these options when you run the service, you canleave them unchanged.

It is possible to override some of the project settings when you call the ContentMaster APIto run a service. However, it is certainly easier to remove any unnecessary settings beforeyou deploy the service.

Deployment Procedure

To deploy a project as a ContentMaster service, perform the following steps.

You must have write privileges for the ContentMaster repository and for theCMReports folder (see Event Logs in the ContentMaster Engine Developer's Guide). Incase of doubt, contact your system administrator.

1. In ContentMaster Studio, open and select the project.

2. On the menu, choose Project > Deploy.

3. In the Deploy Service window, set the following options:

Service NameThe name of the service. By default, this is the project name.

ContentMaster creates a folder having the service name, in the repositorylocation.

Page 266: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 15. Deploying ContentMaster Services

255

LabelA version identifier. The default value is a time stamp, indicating when theservice was deployed.

Startup ComponentThe runnable component that the service should start.

AuthorThe person who developed the project.

DescriptionA description of the service.

4. Click the Deploy button.

ContentMaster Studio displays a message that the service was successfullydeployed. The service is displayed in the Repository view.

Deploying on a Remote ContentMaster EngineTo deploy a service from your development computer to a ContentMaster Enginethat runs on a remote computer (such as a Windows or Unix server), you can usethe following procedure:

1. Deploy the service on the local computer.

2. Copy the deployed project folder from the local ContentMaster repository tothe repository on the remote computer.

3. ContentMaster Engine determines whether any services have been revised byexamining the timestamp of an empty file called update.txt. This file exists inthe repository root directory (by default, the ServiceDB directory).

If this is the first time that you have deployed a service to the remoterepository, update.txt may not exist. If so, copy it from the local repository.

If update.txt exists, you should update its timestamp as follows.

On Windows: Open update.txt in Notepad and save it.

On Unix: Open a command prompt, change to the repository directory, andenter the following command.

touch update.txt

Alternatively, if the local computer can access the remote file system, you canchange the ContentMaster repository to the remote location and deploy directly tothe remote computer.

Page 267: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide 15. Deploying ContentMaster Services

256

Updating a Deployed Service

ContentMaster Studio cannot open a deployed project that is located in theContentMaster repository. If you wish to modify the project, you should followthis procedure:

1. Open the development copy of the project in ContentMaster Studio. Edit andtest it as required.

2. Re-deploy the service to the same location, under the same service name. Youare prompted to overwrite the previously deployed version.

Note that re-deploying overwrites the complete service folder, including anyoutput files or other files that you have stored in it.

Removing a Deployed Service

To remove a ContentMaster service that you have deployed, right-click the servicein the Repository view, and click Remove.

This removes only the copy in the repository. It has no effect on the developmentcopy of the project in your ContentMaster Studio workspace.

Running a Service

After you deploy a service, you are ready to run it in ContentMaster Engine.

You can do this in several ways:

By using the ContentMaster Engine command-line interface. For information,see the ContentMaster Engine Developer's Guide.

By programming an application that uses the ContentMaster API to submitsource documents to the Engine. The API is available in several programminglanguages. See the ContentMaster Engine Developer's Guide and the APIreference documentation.

By posting source documents to the Engine via the CGI interface. See theContentMaster Engine Developer's Guide.

Page 268: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Index

257

Index

.

.NET, 162

A

AbsURL, 120actions, 147

compared to transformers, 148custom COM, 162defining, 148input and output, 147properties of, 148side effects, 147

AddEmptyTagsTransformer, 121AddEventAction, 151AddField, 110AddString, 121AFPToXML, 27alternative parsers

selecting, 83AlternativeMappings, 210Alternatives, 82AlternativeSerializers, 196anchors, 66

defining, 69extent of complex, 80finding from event log, 251location in IntelliScript , 69marker and content, 66phase, 74properties of, 72quick reference, 81reference, 82relation to delimiters, 67relation to XML, 67serialization, 184, 191using transformers, 114

APIsContentMaster engine, 256

AppendListItems, 151AppendValues, 152

applicationsContentMaster services, 253

architectureContentMaster transformations, 1

arithmetic computations, 153assigning

value to output, 173attributes

data holders, 50AttributeSearch, 105authentication

project properties, 236

B

BidiConvert, 122BigEndianUniToUni, 122BinaryFormat, 38BizTalk

splitting large files for, 177

C

CalculateValue, 153CDATADecode, 122CDATAEncode, 123CGI interface

ContentMaster Engine, 256ChangeCase, 123CMW files, 5code pages

supported, 237codes pages

XSD schema, 53color coding

Learn Example, 245Mark Example, 245use in debugging, 245

combinationsof lists, 154

CombineValues, 154COMClass, 178

Page 269: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Index

258

CommaDelimited, 42command-line interface

ContentMaster Engine, 256components

overview of ContentMaster, 2concatenation, 151, 152condition

ensuring in source document, 160Connect, 110Content, 66, 84ContentMaster Engine

running services in, 256ContentMaster Studio

instructions for use, 10overview, 1

ContentSerializer, 191, 197CreateGuid, 123CreateList, 155CustomFormat, 39

D

data holders, 50identifying source and target, 220indexing multiple-occurrence, 214mixed content, 58overview, 3single or multiple occurrence, 64validating, 56

databaselookup transformer, 135querying, 168

databasesconnecting to, 145, 179

DateAdd, 156DateDiff, 157DateFormat, 124dates

format of, 124debugging

ContentMaster projects, 245default transformers, 115DelimitedSections, 87DelimitedSectionsSerializer, 197Delimiter, 47DelimiterHierarchy, 43delimiters

custom hierarchy, 39relation to anchors, 67

direction propertyof anchors, 72

DocList, 21document processors, 25

custom C++, 31custom COM, 28

custom Java, 29defining, 25installation, 25quick reference, 26reference, 27running multiple, 33

documentsoverview, 4

Dos96HebToAscii, 125DownloadFile, 158DownloadFileToDataHolder, 159drag-and-drop

defining anchors, 71DumpValues, 159dynamic offset, 106dynamic search, 108

E

EbcdicToAscii, 125Eclipse

ContentMaster Studio for, 10EDI

delimiters for parsing, 43elements

data holders, 50EmbeddedMapper, 211EmbeddedParser, 90EmbeddedSerializer, 200enclosed

group, 91EnclosedGroup, 91EnclosingDelimiters, 48EncodeAsUrl, 125Encoder, 126encoding

code page transformer, 126input and output, 237limitations, 238supported, 237XSD schema, 53

EnsureCondition, 160errors

viewing, 249event log

configuring properties, 248ContentMaster Engine, 252custom events, 151viewing, 248

eventsfinding anchors, 251

example sourcein project, 5

example_source property, 19

Page 270: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Index

259

examplesinstalling and opening online, 9

Excelgenerating from XML, 35parsing as HTML, 27parsing as text, 28parsing as XML, 28

ExcelToHtml, 27ExcelToTextML, 27ExcelToTxt, 28ExcelToXml, 28ExcludeItems, 162ExpandFrameSet, 28external tools builders

project properties, 240ExternalCOMAction, 162ExternalCOMPreProcessor, 28ExternalJavaPreProcessor, 29ExternalPreProcessor, 31ExternalTransformer, 126extracting content

Content anchor, 84

F

failureeffect on parent, 251

failure events, 250generated by RepeatingGroup, 101

failureseffect of, 251viewing, 249

fatal error events, 250files

downloading, 158projects, 5

FileSearch, 22FindReplaceAnchor, 92format preprocessors, 49FormatNumber, 128forms

submitting HTML, 96, 174, 175frameset

parsing HTML, 28FromBase64Transformer, 128FromFloat, 129FromInteger, 129FromPackDecimal, 130FromSignedDecimal, 130

G

getHTTP method, 175

groupperforming actions on, 94repeating, 100

Group, 94GroupMapping, 212GroupSerializer, 201

H

Hebrewcode-page conversion, 131

hebrewBidi, 130HebrewDosToWindowsTransformer, 131HebrewEBCDICOldCodeToWindows, 131hebUniToAscii, 131hebUtf8ToAscii, 131HL7, 44HTML

removing tags, 138submitting form, 174, 175transforming entities, 131

HtmlEntitiesToASCII, 131HtmlForm, 96HtmlFormat, 39HtmlProcessor, 40, 41, 49, 132HTTP

Get and Post data, 62HTTP interface

ContentMaster Engine, 256

I

iconsevents, 249

ImageClick, 111indexing, 214

example, 217multiple-occurrence data holders, 65quick reference, 228

information events, 249InjectFP, 132InjectString, 132InlineTable, 145IntelliScript

defining anchors in, 72iterations

RepeatingGroup anchor, 100

Page 271: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Index

260

J

JavaScriptsyntax reference, 160

JavaScriptFunction, 166JavaTransformer, 133

K

Key, 229keys

properties of, 228

L

LearnByExample, 106list types

mapping to, 80XSD, 65

listscombining, 154creating, 155multiple-occurrence data holders, 64of variables, 65

LocalFile, 22locations

marking in source document, 98Locator, 232LocatorByKey, 232LocatorByOccurrence, 233locators

properties of, 228login

project properties, 236logs

event, 248LookupTransformer, 134loop

RepeatingGroup anchor, 100

M

Map, 167mapper

calling secondary, 211creating, 205running in ContentMaster Studio, 246

Mapper, 209mappers

deploying as service, 253properties of, 208quick reference, 208running in parser, 170using indexing, 217

mapping anchorsproperties of, 208reference, 210

Marker, 66, 98marking property

of anchors, 73missing text

searching by optional Group, 94mixed content

data holders in, 58in XSD schema, 53mapping to, 69

ModifyField, 111MSMQ

sending to, 177writing to, 176

MSMQOutput, 179multiple occurrence

data holders, 64variables, 65

multiple-occurrence data holderscombining, 154creating lists in, 155indexing, 214mapping anchors to, 68

N

namespacesproject properties, 240

New Element windowdefining anchors in, 71

NewlineSearch, 106NormalizeClosingTags, 135numbers

formatting, 128

O

ODBC_Text_Connection, 145ODBC_XML_Connection, 179ODBCAction, 168ODBCLookup, 135offset

dynamically defined, 106OffsetSearch, 106online samples

installing and opening, 9OpenURL, 180optional failure

effect on parent, 251optional failure events, 250

Page 272: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Index

261

optional propertyevents, 250

outputviewing, 247

OutputCOM, 180OutputDataHolder, 182OutputFile, 183

P

packed decimals, 130, 142parser

running in ContentMaster Studio, 246Parser, 18parsers

calling secondary, 90, 200creating, 12deploying as service, 253running, 16running secondary, 170

pathresolving relative, 120

PatternSearch, 107PdfToTxt_2_02, 32PdfToTxt_3_00, 32phase

of anchor search, 74phase property

of anchors, 73phases

nested, 74platform independence

parsers, 17Positional, 44post

HTTP method, 174posted data

retrieving, 62postprocessors

document, 25PostScript, 45PowerPoint

parsing as HTML, 32PowerpointToHtml, 32PowerpointToTextML, 33predicate

XPath, 231preprocessors

format, 49preprocessors

defining, 25document, 25

ProcessByTransformers, 33processing instructions

adding to output, 244

ProcessorPipeline, 33processors

custom C++, 31custom COM, 28custom Java, 29document, 25installation, 25reference, 27using transformers as, 116

projectconfiguration overview, 7deploying, 8

project properties, 235authentication, 236encoding, 237external tools builders, 240general information, 236namespaces, 240output control, 241setting, 235versus preferences, 235XML generation, 242

projectsarchitecture, 4

projects:, 253properties

of actions, 148of anchors, 72of mappers, 208of serializers, 194of transformers, 117project, 235

Q

quick referenceanchors, 81document processors, 26indexing, 228mappers, 208serializers, 194transformers, 117

R

referenceanchors, 82delimiters, 41document processors, 27format preprocessors, 49formats, 38indexing, 229mappers, 209

Page 273: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Index

262

mapping anchors, 210parsers, 18serialization anchors, 196

reference pointsaround anchors, 73Marker anchor, 98of search scope, 77

Regex++regular expression implementation, 136

regular expressionsexamples , 136learning about, 136

RegularExpression, 136reloading schema, 55RemoveField, 112RemoveMarginSpace, 40, 41, 137RemoveRtfFormatting, 137RemoveTags, 138RepeatingGroup, 100RepeatingGroupMapping, 213RepeatingGroupSerializer, 202Replace, 138replacing text, 142

in source document, 92repository, 254

ContentMaster, 253requirements analysis, 6ResetVisitedPages, 169Resize, 139ResultFile, 183results

of data transformation, 247results file

debugging if not displayed, 247Results folder, 5retrieving content

Content anchor , 84ReverseTransformer, 139right-to-left text

reversing, 122rollback

after failure, 251root element

adding XML, 244RTF, 45RtfFormat, 40RtfProcessor, 40, 49, 139RtfToASCII, 140RtfToTextML, 33RunMapper, 170runnable components, 253RunParser, 170RunSerializer, 172

S

samplesinstalling and opening online, 9

schemaencoding, 53

schemasadding XSD to project, 54creating in ContentMaster, 55editing, 55reloading, 55sample XSD, 51validation, 60viewing, 56XSD, 50

searchanchor direction, 72dynamically defined search string, 108

search criteriafor anchors, 75

search scopeadjusting, 77for anchors, 75

searcher components, 79, 104secondary mapper

EmbeddedMapper anchor, 211secondary parser

EmbeddedParser anchor, 90, 200SegmentIndex, 112SegmentSearch, 107SegmentSize, 112select-and-click

defining anchors, 71serialization

using transformers in, 116serialization anchors, 184, 191

defining, 193properties of, 194reference, 196sequence of operation, 193

serialization mode, 20, 186serializer

controlling auto-generation, 186running in ContentMaster Studio, 246

Serializer, 195creating with wizard, 189

serializers, 184creating from parser, 184deploying as service, 253properties of, 194quick reference, 194running, 191, 208running in parser, 172troubleshooting auto-generated, 187

Page 274: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Index

263

servicesContentMaster types, 4deploying, 254deploying ContentMaster, 253removing, 256root folder, 253running in ContentMaster Engine, 256updating, 256

SetValue, 173SGML, 45signed decimals, 130, 142single occurrence

data holders, 64source documents

testing in ContentMaster Studio, 247source property, 220, 221sources_to_extract property, 20SpaceDelimited, 45splitting files, 177startup components

setting, 246strings

concatenating, 151, 152StringSerializer, 191, 204SubmitAll, 113SubmitClick, 113SubmitForm, 174SubmitFormGet, 175SubString, 140support

contacting, 9system time, 63system variables, 62

T

TabDelimited, 46target property, 220, 226test documents

in project, 6testing

ContentMaster projects, 245Text, 23TextFormat, 40TextML

XML schema, 35TextSearch, 108TGP files, 5time

system, 63ToBase64Transformer, 140ToFloat, 141ToInteger, 141ToPackDecimal, 142ToSignDecimal, 142

TransformByParser, 142transformer

running in ContentMaster Studio, 246TransformerPipeline, 144transformers, 114

as document preprocessors, 116compared to actions, 148custom DLL, 126custom Java, 133default, 115defining, 114deploying as service, 253global stand-alone, 116in serialization, 116properties of, 117quick reference, 117sequences of, 115using as document processors, 33using in anchors, 114

code pages, 126troubleshooting, 245TypeSearch, 109

U

Unixdesigning parsers for, 17

unknown events, 250URL, 23

relative to absolute, 120URLs

specifying connections, 62

V

validating data holders, 56validation

ensuring for XML output, 242XML, 60XML parser output, 60XML serializer input, 61

VarCurrentPost, 63VarCurrentURL, 63VarFormAction, 62VarFormData, 63Variable, 64variables, 61

data holders, 50lists, 65mapping anchors to, 63, 68system, 62using in actions, 64

Variablesuser-defined, 62

VarLinkURL, 62

Page 275: ContentMaster Studio User's Guide - SAP · SAP Conversion Agent by Itemfield (ContentMaster) ContentMaster Studio User's Guide Version 4.0 This product has been renamed as SAP Conversion

ContentMaster Studio User's Guide Index

264

VarPostData, 62VarRequestedURL, 63VarSystem, 63

W

warning events, 249warnings

viewing, 249WestEuroUniToAscii, 144Word

parsing as HTML, 34parsing as RTF, 34parsing as text, 34parsing as XML, 34

WordPerfectToTextML, 33WordToHtml, 34WordToRtf, 34WordToTextML, 34WordToTxt, 34WordToXml, 34workflow

typical ContentMaster development, 6WriteValue, 176

X

XMLadding empty tags, 121as parser input, 12generating sample, 57mapping anchors to, 67processing instruction, 244XSD schemas, 50XSLT transformation, 144

XML attributesdata holders, 50

XML elementsdata holders, 50

XML generationproject properties, 242

XML Spy, 52XML validation

ensuring, 242XmlFormat, 41XMLLookupTable, 146XmlToExcel, 35XPath

modified notation, 58XPath predicate, 231XPaths

validating, 56XSD

adding schema to project, 54background, 50creating in ContentMaster, 55editing, 55editors, 50, 52included schemas, 53IntelliScript representation, 58sample schema, 51schema encoding, 53unsupported features, 53viewing, 56

XSD data typessearching for, 79

XSD schemas, 50in project, 5

XSLTrunning transformations, 177

XSLTMap, 177XSLTTransformer, 144