MT domain customization – conditions and benefits. Chris Wendt (Microsoft)

Translator

Slide 3 3

https://www.facebook.com/FCBayern?sk=wall

https://www.facebook.com/FCBayern?sk=wall

Slide 5

•Learn word and phrase alignments from “parallel” data

Slide 6

••

•

•

•

••

•

•

Slide 7

• f e*e* = argmaxe P(e | f)

•P(e | f) = P(f | e) ∙ P(e) / P(f)

argmaxe P(e | f) = argmax P(f | e) ∙ P(e)

•P(f | e) channel translation model

•P(e) language model

Slide 8

Start With

•Parallel sentences•

•Monolingual data

•Decoding Algorithm

Build These Components

•Translation Model •

•Language Model – P(E)

•Decoder•

Slide 9

Translation Model

Target Language

Model

Other Models

Microsoft s vast language knowledge

Translation Model

Target Language

ModelYour and your community s language knowledge

Translator service and API

Your Applications

Your test and tuning documents Lambda weight vector

Slide 10

Your site or application

Translator Service

Supply Corrections

Consume TranslationsCollaborative Translations

Store

Microsoft Translator Hub

CustomModelsGeneric

Models

Your own, previously translated documents

Supply Documents

Build custom models

Import Correctionsfor training

Slide 11

Your site or application

Translator Service

Supply Corrections

Consume TranslationsCollaborative Translations

Store

Microsoft Translator Hub

CustomModelsGeneric

Models

Your own, previously translated documents

Supply Documents

Build custom models

Import Correctionsfor training

Translate()

AddTranslation()

GetTranslations()

GetUserTranslations()

Speak()

Detect()

BreakSentences()

Thorough customization

Retrain every 2 months,

or 20000 segments

Continuous Improvement

Slide 12

What goes in What it does Rules to follow

Be strict. Compose them to be optimally

representative of what you are going to

translate in the future.Calculate the BLEU score –

just for you.

Dictionaries Forces the given

translation with a

probability of 1.

Be restrictive. Safe to use only for

compound nouns and named entities.

Better to not use and let the system learn.

Build the translation

model aka phrase table.

Teaches how to translate.

Be liberal. Any in-domain human

translation is better than MT. Add and

remove documents as you go and try to

improve the score.

Build the target language

model. Improve grammar

and fluency.

Be liberal. Use any in-domain target

language material you can get.

Slide 13

•• Humans can easily detect 0.5 to 1.0 points

• Faster post-editing

Higher document comprehension

•• Small: Higher improvement within the domain

• Large: Better suited for input variability Better exploit of training docs

• Better to build a larger domain (lower BLEU delta)

•

Slide 14

Slide 15

Quality

SpeedPrice

You can only have

twoP3

Slide 16

Post-Editing

•Goal: Human translation quality

•Increase human translator’s productivity

•In practice: 0% to 25% productivity increase

Varies by content, style and language

Raw publishing

Goals:

Good enough for the purpose

Speed

Cost

Publish the output of the MT system directly to end user

Best with bilingual UI

Good results with technical audiences

Cost-effective way for inbound material

Triage

Analysis and classification

P3 – Post-Publish Post-Editing

Know what you are human translating, and why

Make use of communityDomain experts

Enthusiasts

Employees

Professional translators

Best of both worldsFast

Better than raw

Always current

Slide 17

Assimilation Dissemination Post-Edit

Use customized machine translation

Never miss a chance to collect a human edit

Make the source visible on demand Show the source

Show domain-relevant dictionaries

Apply TM with 100% Apply TM with 80%

Reveal alternatives

Publish raw first, collect human feedback Use modern, collaborative TM

systems (i.e. MemSource)

Slide 1818

Slide 19

•

•• Deep Neural Networks (>30% in ASR)

• Recurrent Neural Networks (1-6 BLEU)

•

•• Filtering, domain adaptation

•

•

•

Slide 20

Slide 21

Slide 22

Inclusive-HD_TranslatorOnly.mp4

Inclusive-HD_TranslatorOnly.mp4

blogs.msdn.com/translator

twitter.com/MSTranslator

facebook.com/MicrosoftTranslator

linkedin.com/company/Microsoft-Translator

microsoft.com/translator

http://blogs.msdn.com/Translator

http://www.twitter.com/BingTranslator

http://www.facebookcom/microsofttranslator

http://www.linkedin.com/Microsoft-Translator

http://www.aka.ms/translatorlinkedin

http://www.aka.ms/translatorlinkedin

http://microsoft.com/translator

Slide 24

Presentations & Public Speaking

MT domain customization – conditions and benefits. Chris Wendt (Microsoft)