10
1 2003-11-20 | Conférence de l’ACCTI Conference © 2003 NMi Consulting Selecting a Computer-Aided Translation (CAT) tool for XML content — A cookbook — Normand Montour – Principal Consultant Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting Agenda § Introduction § What is XML ? (How much XML do I need to know ? ) § Approach to selecting your CAT tool for XML content § Conclusion Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting Introduction § Warning! This method was used in customer situations Its scope covers the selection of CAT tools to translate and manage translation of XML-based content It is not a general-purpose method for evaluating CAT tools Additional XML-related standards are continually emerging; ensure that your deployment is aligned with the latest trends.

Selecting a Computer Aided Translation Tool

Embed Size (px)

Citation preview

1

2003-11-20 | Conférence de l’ACCTI Conference © 2003 NMi Consulting

Selecting a Computer-Aided Translation (CAT) tool for XML content — A cookbook —

Normand Montour – Principal Consultant

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Agenda

§ Introduction§ What is XML ? (How much XML do I need to know ? )§ Approach to selecting your CAT tool for XML content

§ Conclusion

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Introduction

§ Warning! – This method was used in customer situations

– Its scope covers the selection of CAT tools to translate and manage translation of XML-based content

– It is not a general-purpose method for evaluating CAT tools

– Additional XML-related standards are continually emerging; ensure that your deployment is aligned with the latest trends.

2

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Introduction§ Context

– Web-oriented (Internet/Intranet/Extranet) data constitutes a much larger (and much growing) part of each organization’s translatable content

– This content takes the form of many new file types and document formats : HTML, XML, SGML, XSL (T-FO), PhP, Java, etc.

– Traditional Fax, Print, cut-and-paste and PDF methods are no longer efficient enough to meet the demands of quick turnaround, efficient and cost-effective translations

– More one-to-many language translations than ever before

– CAT tools need to provide a leverage in translating the material directly in the document format without having to perform post-editing or formatting functions.

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Where did XML come from ?

ä 1996: some 80+SGML experts form the W3C SGML WG toä Support generalized markup on the Web

ä Produce ideally valid SGML documents

ä Provide URL (as in HTML) compatible hyperlinkingä XML

ä is a (meta)language, a profile of SGML

ä brings generalized markup to the Webä XML documents

ä are self-describing (document specific DTD)

ä can be validated against a reference structure (the DTD)

ä are platform and software neutral

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

OK, so What is XML ?

“XML is the Extensible Markup Language. It is designed to improve the functionality of the Web by providing more flexible and adaptable information identification.

It is called extensible because it is not a fixed format like HTML (a single, predefined markup language). Instead, XML is actually a `metalanguage' —a language for describing other languages—which lets you design your own customized markup languages for limitless different types of documents. XML can do this because it's written in SGML, the international standardmetalanguage for text markup systems (ISO 8879).”

From WWW.XML.org

3

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

XML is…

Extensible Markup LanguageW3C : W3C :

http://www.w3.org/TR/REChttp://www.w3.org/TR/REC--xmlxml

XML is about– Descriptive Markup, not Procedural Markup

– Structure Definition (DTD) or XML Schema

– Documents Conforming to Structure (Instance)

– Software and Platform Independent Format

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

XML States Syntactic Rules

§XML is not– A uniform document structure

– A standard list of markup tags

§XML is– A language for defining hierarchic document

structure: Document Type Definition (the DTD), and

– For descriptive markup of text: the XML instance

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Markup

§ Markup is adding non-textual information into a text to make it more meaningful§ Traditional Examples :

spaces between words

emphases (italics, bold, underlined)

layout (new lines, new pages, bullets)

4

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Expressing Structure in XML§ Elements

Structural and logical components

TitlesChaptersAirlineAirport…

Chosen with respect to structural and logical nature of information components

Assembled in model groups

ex. (A, B, (C | D))Delimited by a Start Tag and an End Tag

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

§Elements - Hierarchy– Structural elements contain sub-elements or model

groups

– Sub-elements may contain their own sub-elements or model groups

– Nesting of elements and model groups define a hierarchy

– The highest level element is referred to as the Document Type

Expressing Structure in XML

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Expressing Structure in XML

Using an element requires three components

End TagStart Tag Semantic Content of Element

Generic Identifier

<Title>An Introduction to SGML</Title>

End Tag Differentiation

5

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Two types of XML§ Well-formed XML:§ Minimally meets some structural criteria§ Tags have to be balanced§ Naming of elements has to be correct§ Typically, instances extracted from databases or

converted from word processors are well-formed XML

§ Valid XML§ Meets all structural requirements§ Must conform to a DTD or a Schema§ Can be parsed for validity by a standard parser§ Typically, instances produced by standardized

publishing software are found in this category

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Expressing Meta-Data in XML

§ Attributes– Represent “meta-information” - information about information

– Qualify elements

– Comprise a name and a value

– Used in the start tag of elements

<Book status="revision" version=’4.1.2’>

Element Attribute

Attribute

Value delimiter LIT Attribute value

Value delimiter LITA Attribute value

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Expressing Objects in XML§ Entities

“A collection of characters that can be referenced as a unit”

Character strings assembled into an information object, or non character data (graphics, multimedia) assembled in a storage object, with a name that can be used for referencing

Used for

– storing XML document instances (fragments)– recalling long strings through short names (macro)– inserting external objects into an XML document (e.g.

graphics, multimedia) – inserting special characters not available on a keyboard

(e.g. &eacute; &Agrave;, etc.)

6

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Three dimensions of information§ Structure:

– Hierarchic organization

– Expressed in terms of semantic content

Independent of platform and software

Traditional publishing software express format rather than structure, in a proprietary manner !

§ Content:– Source information (text, data, graphics,

multimedia)

§ Format:– Appearance of published content

– Specific to platform and software

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Logical Components

§ElementsTitles, chapters, sections, fielded data, etc.

§AttributesID, version, language, security, etc.

§EntitiesStandard text, special characters, external

objects, graphics, etc.

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Approach in selecting your CAT tool

§ Gather requirements and selection criteria

§ Inventory of major vendors in this arena

§ Determine how vendor products respond to selection criteria

§ Establish the software pricing and related costs

§ Recommend a solution

§ Lay out an implementation strategy

§ Define a rollout plan

§ Create the training material, deliver training and provide on-going support

7

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Approach - Gather requirements and selection criteria

§ XML Criteria:– Can validate (I.e. strictly adheres to the DTD) document before (so

that you won’t be told later you made the document invalid) and after the translation (to ensure that you didn’t make it invalid)

– Can properly respect tag content and placement at import and at export (tag locking, hiding, showing)

– Can properly deal with character entities, both for import and export

– Allows the translator to displace tags within a segment

– Allows the translator to unlock/modify the content of tags (attributes, URLs, etc.) when necessary

– When aligning existing translations, the software should make use of the tags and attributes to improve on the alignment quality and the subsequent segmentation

What does that mean in practice ?

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Other criteria to consider

§ Alignment tool should utilize the markup to improve on the quality of the alignment§ XML-based exchange standards (LISA) for

Translation Memory (TMX) and Terminology (TBX) means you can get XML to work for YOU!

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

CAT tools for XML content should…

§ Be designed to respect and protect XML markup during the translation process§ Allow the translator to translate « between the

tags » so they don’t have to worry about the proper use of tags, or their place in the document.§ Allow the translator to insert missing tags/entities

when they are present in the source segment but not in the Translation Memory§ Allow the translator to show or hide tags/entities

as required§ For example…

8

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Trados Tag EditorFull-Tag View

Element Section5 with id and rev attributes

Start Tag for element Heading

End Tag for element Heading

RefExt and RefIntelements are « inline »

Entities are used for special characters

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Trados Tag EditorNo-Tag View

•Contains less information about the structure

•Provides a clearer view of the context

•Translator can easily flip/flop between the two views

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Approach - Inventory of major vendors that claim XML support

§ Atril – DéjàVu§ CypresoftTransSuite§ MultiTrans§ SDLX§ StarTransit§ SynchoTerm§ Trados§ Wordfast

9

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Approach - Determine how vendor products respond to selection criteria

Criteria 4…Criteria 3Criteria 2Criteria 1Vendor

Wordfast

Trados

SynchoTerm

StarTransit

SDLX

MultiTrans

Cypresoft TransSuite

Atril – DéjàVu

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Approach - Establish the software pricing and related costs

§ Things to remember:– Software cost is not everything…

– You may need to upgrade your hardware and network

– You may need to consider the support costs (current year and ongoing)

– Consider the one-time costs (training, installation, configuration, etc.)

– Consider the cost of aligning your legacy documents

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Approach – cont’d§ Recommend a solution

– Make a short list of vendors

– Invite those vendors to participate in a pilot project § Lay out an implementation strategy

– Leverage “champions” and other enthusiasts

– Avoid attitude pitfalls

– Acquire top-management endorsement early in the project§ Define a rollout plan

– Use a pilot project to show early benefits

– Don’t run until you can walk

– Choose a visible portion of your corpus for your pilot§ Create the training material, deliver training, provide

on-going support and follow up with your users§ Measure your costs and benefits

10

Selecting a Computer-Aided Translation (CAT) tool for XML content | A cookbook | © 2003 NMi Consulting

Conclusion

§ XML content needs to be treated with special attention and with special tools§ Make sure you establish clearly your criteria for

the selection of your CAT tool§ Use a rigorous selection process § Test, test and re-test§ Measure your costs and benefits – they will make

you get the proper management attention