Upload
tauyou
View
300
Download
1
Tags:
Embed Size (px)
DESCRIPTION
10 Decisions to make before starting to use Machine Translation (MT), including details on how to improve MT engines.
Citation preview
Your Trained Moses SMT System doesn't work.
What can you do?
Diego Bartolome, CEO tauyou <language technology>[email protected]@diegobartolome
Where are you now?
Where are you now?
Why Machine Translation?
Strategic decision
Increase sales
Shorten delivery times
Reduce costs
Differentiation
Forced decision
Clients ask for it!
Dare – change5
Welcome to the jungle
Decision 1: Internal – external
Core competence
Resources
ROI
Time to market
Decision 1: Internal – external
Core competence
Resources
ROI
Time to market
MT Costs
Internal development
Free tools
DOiY solutions
Traditional pricing model
tauyou managed solution
Decision 2: MT Type (I)
Rule-based MT
Statistical MT
Hybrid MT
Decision 2: MT Type (II)
Do we really care?
Decision 3: Languages (I)
Source: translate.autodesk.com
Decision 3: Languages (II)
Source: Philipp Koehn
Decision 4: Domains
Who is willing to pay?
Where does your revenue come from?
What are your key skills?
What domains achieve good quality?
Decision 5: Workflow
Use MT as a secondary TM
Bilingual pre-translated translation files
CAT tool integration
Differentiated workflow
Decision 6: Feedback
Qualitative
Use updated TMs in new trainings
Immediate (incremental) retraining
Rule-based automatic post-editing
Selective pre- and/or post-processing
Decision 7: Post-editors
What are the skills needed?
Post-editing guidelines
How do we pay them?
Decision 8: Metrics
SMT metrics: BLEU, NIST
Feedback from translators
Translation time vs. Post-editing time
Word Error Rate (WER) or Edit Distance
Cost reduction
Decision 9: Business Model
Decision 10: Start!
Let's play with Moses
Let's play with Moses
Best resource to start
www.statmt.org/moses
TAUS tutorial
www.translationautomation.com
tauyou slides
www.speakerdeck.com/tauyoucom
Everything is clear!
Gather TMs and other linguistic assets
Select domains
Train systems
BLEU score is great
… but …
Translation quality is awful
Why?
Not enough data
Too much data
Unclean TMs
Misalignments
Difficult language pairs
Selection of wrong parameters
Suboptimal techniques
Some steps
Maximum exploitation of existing assets
Source content optimization
Data selection and cleaning
Improvement of the models
Linguistic processing
Continuous improvement
Linguistic assets
Translation memory sharing
Clients, Partners, EU, UN, TAUS
Relevant on-line data retrieval
Advanced TM techniques
Sub-segment matching
Parts of Speech replacement
Source optimization (I)
Spell check
Grammar check
Style check
Terminology check
Client checklist
newdoc
proposeddoc + html
report
Summarization
% to reduce
Use translation memories
Project
Client
All
newdoc
proposeddoc + html
report
Data selection + cleaning
Clean translation memories
Length, punctuation, terminology, …
Inconsistencies, repetitions, ...
Segment splitting
Optimize weight of most frequent n-grams
Validate their translations
Add out-of-domain data
Models optimization
Filter the translation tables
Remove the garbage + tune weights
Optimize language models
Adapt them to the translation purpose
Tune parameters correctly
Tune set, test set, optimization parameters
Improve recasing
Linguistic processing
In the source and/or target language
Grammar checking
Entities detection
Proper nouns, alphanumeric words, ...
Compound words splitting
Sentence reordering
Life is about the people you meet and the things you create with them.
So go out and start creatingPart of the Holstee Manifesto
Diego BartolomeCEO tauyou <language technology>[email protected]@diegobartolome
Thank you!
Diego BartolomeCEO tauyou <language technology>[email protected]@diegobartolome