Upload
amalia
View
39
Download
1
Embed Size (px)
DESCRIPTION
Inline Markup in XLIFF 2.0. Fredrik Estreen - Lionbridge Yves Savourel - ENLASO. Disclaimer. While we believe the information presented here is pretty stable, but it only reflects the general consensus of the sub-committee working on the inline markup. - PowerPoint PPT Presentation
Citation preview
Fredrik Estreen - LionbridgeYves Savourel - ENLASO
Inline Markup in XLIFF 2.0
While we believe the information presented here is pretty stable, but it only reflects the general consensus of the sub-committee working on the inline markup.
Things may change during the formal approval by the sub-committee and later when it goes through the process of review and approval from the main XLIFF TC.
Disclaimer
• Principles and Background
• Inline Markupo Characters that are invalid in XMLo Native Codeso Annotations
• Extensions
• Processing requirements
• XLIFF Toolkit
Agenda
Some of the guidelines we are trying to follow during the work:
• Try to have only one way to do one thing
• Provide processing requirements
• Try to re-use existing standards when possible
• Try to keep things simple
Some Principles
The structural part of XLIFF changes in 2.0 and the inline markup should be easy to handle in the new model.
• Static structureo <file> -> <group>* -> <unit>o Contents of the concatenated <source> elements
remain static during processing
• Dynamic structure inside <unit>o <segment>, <ignorable> -> <source>, <target>o A processor may merge or split the contents of
segments or ignorable.
Containing Structure
The inline markup is what's inside the <source> and <target> elements
• Characters that are invalid in XML
• Original inline codes
• Annotations
What's the Inline Markup?
• Inline codes belong to the <unit> and not to the <segment>(s)
• ID uniqueness within the <unit>
• Allows simple re-segmentation of the content of <unit>
• No need to clone codes that span multiple segments
Inline codes and segmentation
For example control characters are not allowed in XML content, so they cannot be stored as-it in XLIFF.
<cp hex="0007"/> represents U+0007 (the "bell" character)
- Same as Unicode LDML format
- Only characters invalid in XML must use this notation.
Characters that are Invalid in XML
• Support any type of native markup
• Standalone: <ph/>
• Spanning: <pc> and <sc/> + <ec/>
Inline Codes
All possible cases:
Standalone code <ph id='1'/>
Well-formed spanning code <pc id='1'>text</pc>
Start marker of spanning code <sc id='1'/>
End marker of spanning code <ec rid='1'/>
Orphan start marker of spanning code <sc id='1' isolated='yes'/>
Orphan end marker of spanning code <ec id='1' isolated='yes'/>
Inline Codes - Use Cases
• No storage:
<source>A<ph id="1"/>B</source>
• Store, but only outside the segment:
<source>A<ph id="1" nid="d1"/>B</source>
<originalData> <data id="d1"><BR></data>
</originalData>
Inline Codes - Storage of Original
<mrk> for well-formed constructs
<sm/> + <em/> otherwise
Attributes:
• id (required)
• type (default=generic)
• translate (yes or no, default=yes)
• ref (optional type-specific URI)
• value (optional type-specific text/data)
Annotations
• Translate annotations
• Term annotations
• Comment annotations
• Custom annotations
The IDs link the same annotation in source and target if needed.
Annotations Types
• To protect (or not) a span of content:
<mrk id="1" translate="no">content</mrk>
Note that translate can also be used with other types of annotations.
Translate Annotation
• To denote a "term":
<mrk id="1" type="term" value="simple definition" ref="reference to more info">content</mrk>
The id links source and target if needed
Term Annotation
• Simple:
<source><mrk id="1" type="comment" value="The text of the comment">content</mrk></source>
• With associated note:
<source><mrk id="1" type="comment" ref="#n1">content</mrk></source>
<notes>
<note id="n1">Text of the note</note></notes>
Comment Annotation
• User-defined annotation:
- The type attribute = <prefix>:<userType>
- The meanings of the value and ref attributes are defined by the user.
<mrk id="1" type="myPrefix:isbn" value="978-0-14-44919-8">The Epic of Gilgamesh</mrk>
Custom Annotation
• A few attributes can take user-defined values: e.g. mrk@type, ph@type, pc@type
• No additional attributes are allowed in any of the inline elements
• No additional elements are allowed inside <source>, <target> or <data>
Custom annotations are essentially the only way to extend markup inside the inline content.
Extensions
• Allowed markup transforms and related attribute mapping. Between <pc> and <sc>,<ec> pair.
• Define requirements for creation and editing of target text.
• Rules on cloning markup with and without reference to native data
• Stricter rules on attributes and ID references
• How to handle segmentation changes
Processing Requirements
• Java-based and open source (LGPL)
• http://code.google.com/p/okapi-xliff-toolkit/
• Stream-based rather than DOM to handle very large documents
• Reader is event-driven
• Unit available as single object
• Writer also available
XLIFF Toolkit - A Library and More
XLIFFReader reader = new XLIFFReader();
reader.open(new File("myInput.xlf"));
while ( reader.hasNext() ) {
XLIFFEvent event = reader.next();
if ( event.getType() == XLIFFEventType.TEXT_UNIT ) {
Unit unit = event.getUnit();
// Do something with the unit
}
}
reader.close();
Library - Reading a Document
XLIFFReader reader = new XLIFFReader();
XLIFFwriter writer = new XLIFFWriter();
reader.open(new File("myInput.xlf"));writer.create(new File("myOutput.xlf"));
while ( reader.hasNext() ) {
XLIFFEvent event = reader.next();
if ( event.getType() == XLIFFEventType.TEXT_UNIT ) {
Unit unit = event.getUnit();
// Do something with the unit
}
writer.write(event);
}
reader.close(); writer.close();
Library - Updating a Document
Useful links
• Read the latest Editor's Draft:https://wiki.oasis-open.org/xliff/
• Comment or ask questions in the mailing lists:https://lists.oasis-open.org/archives/xliff-comment/https://lists.oasis-open.org/archives/xliff-users/
• Try out the toolkit:http://code.google.com/p/okapi-xliff-toolkit/
Q & A