1– 1
DITA, XLIFF and Translation:Truths, Myths and Misconceptions
www.oasis-open.org
Andrzej Zydroń MBCS CITPCTOXML-INTLLeaders in Translation TechnologySeptember [email protected]
1– 2
DITA is cheaper to translate: Truths
Separation of content and format
Component based architecture
You only translate changed topics
1– 3
Translation is the main cost of localization: Myths
1– 4
DITA Granularity: a double edged swordDITA without a CMS – you must be NUTS!You DO NOT NEED a native XML CMSLinks, links, links, links…..Translate topics as soon as they are availableIncreased project management costsNecessitate web services based exchangeNeed to establish long term relationship with Localization Service Providers
1– 5
DITA Translation Pitfalls
DITA comes ready packed with some very dangerous optionsTranslatable acronymsBeware the CONREF for it may TRIPLE your translation costsSpecialize if you dareDITA Translation TC Best Practices
1– 6
DITA + XLIFFOnly part of the picture
XML1.0
Unicode 5.0
XML Vocabulary, e.g. DITA
xml:tm
Author Memory Translation Memory
SRX
GM
X
W3C ITS
Unicode TR29
XLIFF
TMX
1– 7
OAXAL:OASIS Reference Architecture TC
xml:tm
Unicode TR 29
SRX
W3C ITS
GMX-V
DITA/XML
TMXXLIFF
1– 8
OAXAL TC
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=oaxal
1– 9
Collaborativereview andapproval
Pre-translate content and alert translators
How does it work?
Check content into DocZone server
XMLXMLXML
XMLXMLXMLXMLXMLXML
Vendor localizes XML content with DocZone translation tool
Write/edit XML content
Publish to all output formats for all markets
HTML
Create graphicsStore in DocZone
Link with XML content
DocZone.com example
1– 10
OAXAL: Why is DITA + XLIFF not enough
Process Automation50% Translation costs process management
MatchingStrange commercial model for translation companiesAutomation, automation, automation
1– 11
DITA + OAXAL putting it together:
DITA/XML+
xml:tm
Unicode TR 29
SRX
W3C ITS
DITA/XML
1– 12
xml:tm namespace
Example of the use of tm namespace in an XML document:
<document xmlns:tm="urn:xml-intl-tm" te="9"><tm:tm><section>
<para><tm:te id="e1">
<tm:tu id="u1.1">Namespace is very flexible.
</tm:tu><tm:tu id="u1.2">
It is very easy to use.</tm:tu>
</tm:te></para>
1– 13
xml:tm namespace
docdoc
titletitle
sectionsection sectionsection
parapara
tmtm
tete sentencesentence sentencesentencetutu tutu
tete sentencesentence sentencesentencetutu tutu
tete sentencesentence sentencesentencetutu tutu
Source document tm namespace
viewtete texttexttututexttext
tete sentencesentence sentencesentencetutu tutu
parapara texttext
parapara texttext
parapara texttext
parapara texttext
parapara texttext
tete sentencesentence sentencesentencetutu tutu
tete sentencesentence sentencesentencetutu tutu
texttext
Source document view
1– 14
Author memoryMaintain memory of source textAuthoring statisticsAuthoring tool input
Translation memoryAutomatic alignmentMaintain exact link of source and target textReduce translation costs
xml:tm namespace
1– 15
xml:tm differencing
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
Original Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
deleted
tu id=”8”
modified
new
Updated Source Document
DOMDifferencing
1– 16
xml:tm author memory
Namespace aware DOM differencingIdentify changes from the previous versionUnique text unit identifiers are maintainedModification historyText units can be loaded into a databaseAuthoring environment integration
1– 17
xml:tm author memory
Namespace aware DOM differencingIdentify changes from the previous versionUnique text unit identifiers are maintainedModification historyText units can be loaded into a databaseAuthoring environment integration
1– 18
1– 19
XLIFF + xml:tm :
DITA/XML+
xml:tm
GMX/V
XLIFF
1– 20
DITA/OAXAL to XLIFF
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
Original Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”5”
tu id=”6”
Translated Target Document
Trans-unit id=”1”
XLIFF File
Trans-unit id=”2”
Trans-unit id=”3”
Trans-unit id=”4”
Trans-unit id=”5”
Trans-unit id=”6”
1– 21
xml:tm exact matching
Updated Source Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
deleted
tu id=”8”
modified
new
Matched Target Document
tu id=”1”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
tu id=”8”
Exact Matching
requires translation
requires translation
Exact match
Exact match
Exact match
Exact match
1– 22
xml:tm matchingUpdated Source
Document
tu id=”1”
tu id=”2”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
non trans
tu id=”8”new:same
Matched Target Document
tu id=”1”
tu id=”3”
tu id=”4”
tu id=”7”
tu id=”6”
tu id=”8”
requires translation
requires proofing
fuzzy match origid="5"
doc leveraged match
tu id=”9” tu id=”9”
DB
requires proofing
DB leveraged match
tu id=”2”requires no translation
non translatable
Exact match
Exact match
Exact match
Exact match
modified
1– 23
xml:tm translated document
docdoc
titletitle
sectionsection sectionsection
parapara
tmtm
tete zdaniezdanie zdaniezdanietutu tutu
tete zdaniezdanie zdaniezdanietutu tutu
tete zdaniezdanie zdaniezdanietutu tutu
Translated docuemnt tm
namespace viewtete tekstteksttututeksttekst
tete zdaniezdanie zdaniezdanietutu tutu
parapara teksttekst
parapara teksttekst
parapara teksttekst
parapara teksttekst
parapara teksttekst
tete zdaniezdanie zdaniezdanietutu tutu
tete zdaniezdanie zdaniezdanietutu tutu
teksttekst
translated document view
1– 24
Translation without OAXAL:
source text
source text extract extracted text tm process
prepared text
translatetranslated text
target texttarget text
mergetarget text
QA
1– 25
OAXAL in action
xml:tm source text
extract extracted text
tm process
XLIFFfile
translate
xml:tm target text
merge
Internet
exact matching
leveraged matching
Automated Workflow
web browserweb browserQA
Automated Workflow
1– 26
1– 27
Normal DITA document
1– 28
DITA Document with xml:tm namespace
1– 29
xml:tm version encoded
DITA Document with xml:tm namespace embeded as a Base64 encoded Processing Instruction
1– 30
XLIFF File version after matching
1– 31
Contact Details
Postal address:PO Box 2167Gerrards CrossBucks SL9 8XFUnited Kingdom
Phone: +44 1753 480 467 Fax: +44 1753 480 465 Andrzej Zydroń – [email protected]