View
418
Download
0
Embed Size (px)
Citation preview
Translator
Slide 3 3
Slide 5
•Learn word and phrase alignments from “parallel” data
Slide 6
••
•
•
•
••
•
•
Slide 7
• f e*e* = argmaxe P(e | f)
•P(e | f) = P(f | e) ∙ P(e) / P(f)
argmaxe P(e | f) = argmax P(f | e) ∙ P(e)
•P(f | e) channel translation model
•P(e) language model
Slide 8
Start With
•Parallel sentences•
•Monolingual data
•Decoding Algorithm
Build These Components
•Translation Model •
•Language Model – P(E)
•Decoder•
Slide 9
Translation Model
Target Language
Model
Other Models
Microsoft s vast language knowledge
Translation Model
Target Language
ModelYour and your community s language knowledge
Translator service and API
Your Applications
Your test and tuning documents Lambda weight vector
Slide 10
Your site or application
Translator Service
Supply Corrections
Consume TranslationsCollaborative Translations
Store
Microsoft Translator Hub
CustomModelsGeneric
Models
Your own, previously translated documents
Supply Documents
Build custom models
Import Correctionsfor training
Slide 11
Your site or application
Translator Service
Supply Corrections
Consume TranslationsCollaborative Translations
Store
Microsoft Translator Hub
CustomModelsGeneric
Models
Your own, previously translated documents
Supply Documents
Build custom models
Import Correctionsfor training
Translate()
AddTranslation()
GetTranslations()
GetUserTranslations()
Speak()
Detect()
BreakSentences()
Thorough customization
Retrain every 2 months,
or 20000 segments
Continuous Improvement
Slide 12
What goes in What it does Rules to follow
Be strict. Compose them to be optimally
representative of what you are going to
translate in the future.Calculate the BLEU score –
just for you.
Dictionaries Forces the given
translation with a
probability of 1.
Be restrictive. Safe to use only for
compound nouns and named entities.
Better to not use and let the system learn.
Build the translation
model aka phrase table.
Teaches how to translate.
Be liberal. Any in-domain human
translation is better than MT. Add and
remove documents as you go and try to
improve the score.
Build the target language
model. Improve grammar
and fluency.
Be liberal. Use any in-domain target
language material you can get.
Slide 13
•• Humans can easily detect 0.5 to 1.0 points
• Faster post-editing
Higher document comprehension
•• Small: Higher improvement within the domain
• Large: Better suited for input variability Better exploit of training docs
• Better to build a larger domain (lower BLEU delta)
•
Slide 14
Slide 15
Quality
SpeedPrice
You can only have
twoP3
Slide 16
Post-Editing
•Goal: Human translation quality
•Increase human translator’s productivity
•In practice: 0% to 25% productivity increase
Varies by content, style and language
Raw publishing
Goals:
Good enough for the purpose
Speed
Cost
Publish the output of the MT system directly to end user
Best with bilingual UI
Good results with technical audiences
Cost-effective way for inbound material
Triage
Analysis and classification
P3 – Post-Publish Post-Editing
Know what you are human translating, and why
Make use of communityDomain experts
Enthusiasts
Employees
Professional translators
Best of both worldsFast
Better than raw
Always current
Slide 17
Assimilation Dissemination Post-Edit
Use customized machine translation
Never miss a chance to collect a human edit
Make the source visible on demand Show the source
Show domain-relevant dictionaries
Apply TM with 100% Apply TM with 80%
Reveal alternatives
Publish raw first, collect human feedback Use modern, collaborative TM
systems (i.e. MemSource)
Slide 1818
Slide 19
•
•• Deep Neural Networks (>30% in ASR)
• Recurrent Neural Networks (1-6 BLEU)
•
•• Filtering, domain adaptation
•
•
•
Slide 20
Slide 21
blogs.msdn.com/translator
twitter.com/MSTranslator
facebook.com/MicrosoftTranslator
linkedin.com/company/Microsoft-Translator
microsoft.com/translator
Slide 24