18
Machine Translation The Translator’s Choice Heidi Düchting Sylke Krämer Johann Roturier

Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Embed Size (px)

Citation preview

Page 1: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Machine Translation The Translator’s Choice

Heidi DüchtingSylke KrämerJohann Roturier

Page 2: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Outline

Background

Challenges

Solutions

Benefits

Next steps

Conclusions

Page 3: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Commercial Imperatives

Effective– Time-critical documents in volume

Efficient– Translation process automation– Combining translation technologies

workflow TM, MT, and PE tools

Control– Loose writing guidelines vs. Controlled Language rules

Improved machine translatability

Page 4: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Commercial Systems

Combine technologies TM with previously machine-translated and post-edited

segments for look-up

TM systems with MT component Rule based and Example based Pre-translate phase Towards improved post-editing efficiency? Not available in all systems

MT systems with TM component 100 % match look-up

Page 5: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Challenges

Setting a threshold for TM matches– 100% matches only

suitable when the objective is to provide MT output for gisting (no post-editing)

suitable when the MT system is fully customized and CL environment is in place (no post-editing?)

Quick PE New sentences in which only one character changes are sent

to the MT engine– W32.Beagle.AB is a mass-mailing worm that neither propagates via

network shares nor deletes files– W32.Beagle.AC is a mass-mailing worm that neither propagates via

network shares nor deletes files

Page 6: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Solutions (1)

Two-tier process Leverage Trados TM repository Use MT system to translate unknown segments (Systran

Premium 5.0) Use MT output as TM input

Determine the export threshold Existing TM segments vs. new controlled segments

– Uncontrolled: Symantec announced a patch was available– CL: Symantec announced that a patch was available

Page 7: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Solutions (2)

TMX format obvious choice as the exchange format XLIFF not supported by all MT systems source and target segments

<tu usagecount="1" creationdate="20050301T122255Z" creationid="SUPER"><tuv lang="EN-US"><seg>Then the worm searches all local and network drives for .gif, .bmp, and .wav

files.</seg></tuv><tuv lang="DE-DE"><seg>Then the worm searches all local and network drives for .gif, .bmp, and .wav

files.</seg></tuv></tu>

Page 8: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Processing TMX

Technical issues TMX's various implementations can create discrepancies

during the exchange process Identical source and target segment XML parser and TMX header

Pre and post processing with a single macro Modules to remove and restitute sections Environment: VBA

Page 9: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Pre-translation Workflow

Step 6:Import

segments into TM

Step 6:Import

segments into TM

Step 5:Post-

processing module

Step 5:Post-

processing module

Step 4:Call to MT

system

Step 4:Call to MT

system

Step 3:Pre-

processing module

Step 3:Pre-

processing module

Step 2:Export

unmatched segments

Step 2:Export

unmatched segments

Step 1:Analyze

new document

Step 1:Analyze

new document

Page 10: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Effective pre-translation

Efficiency and robustness Refinable

Opportunity for modifications Target segments CL environment predictability Frequent errors

Ideal scenario Address problems that could not be fixed with CL rules

Page 11: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Towards Automated Post-Editing

Surface post-editing No linguistic analysis: no second MT Text processing Frequent errors due to default MT settings Remove drudgery from post-editing

Lexical Capitalization (folgende vs. Folgende) Incorrect spelling (neuzustarten vs. neu zu starten) Missing contractions (à le vs. au) Extra words (fichier de .bmp vs. fichier .bmp)

Page 12: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Towards Automated Post-Editing

Syntactic Word order: “Klicken auf Sie” vs. “Klicken Sie auf” Wrong structures (transfer or generation issue): neither…nor

(ni ne..ni ne)

Textual Formatting: trailing spaces after symbols (backslashes) Punctuation inconsistent with style guide: inverted commas

for German

Page 13: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Towards Automated Post-Editing

Suitability of the environmentRegular expressions supportRE are a ‘way to describe text through pattern matching’ (Stubblebine 2003: 1)Grouping and Capturing:

1.Match: ([Kk]licken) (auf) (Sie)

2.Replace: \1 \3 \2

Page 14: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Content workflow

Page 15: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier
Page 16: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Next steps

New environment– GMS integration

Centralized interface with content Transport layer MT as plug-in

– XLIFF format To machine translate unmatched segments

– PE replacements Fine-tune contextual replacements

Page 17: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Conclusions

Combining MT & TM is efficient leverage post-editing is not repeated increased throughput

Environment for avoiding errors facilitated when CL rules are introduced Scope of errors is reduced

New opportunities for translators Fine-tuning MT user dictionaries Refine automated PE tasks

Page 18: Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier

Thank [email protected]