Multi Language Support for Virtual...

Preview:

Citation preview

가상어시스턴트를위한다국어지원

April 2020

Soporte multilenguaje para asistentes virtuales

对虚拟助手的多语言支持

Supporto multilingue per assistenti virtuali

پشتیبانی چند زبانه برای دستیاران مجازی

Prise en charge multilingue pour les assistants virtuels

Suporte em vários idiomas para assistentes virtuais

Multi Language Support for Virtual Assistants

वर्चअुल असिस्टेंट के सलए मल्टी लैंग्वेज िपोटु

仮想アシスタントの多言語サポート

Overview

Overview

• Extending the current capabilities of Almond to other languages in a cost and time efficient manner

• Avoiding template development for each new language

Goals:

Overview

• Extending the current capabilities of Almond to other languages in a cost and time efficient manner

• Avoiding template development for each new language

Goals: Solution:

Data collection strategy:

• Using neural machine translation models to produce translated sentences

• Improving translation quality using domain-dependent rules

Training strategies:

• Joint and sequential training

• Enforcing low variance on encoded outputs on same sentences from different languages

Data Collection method

Data Collection methoddisplay all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Sentence Program

English Dataset

Data Collection methoddisplay all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Sentence Program

English Dataset

Pre-Processing

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Sentence Program

English Dataset

Pre-Processing

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Sentence Program

English Dataset

Pre-Processing

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Post-Processing

Sentence Program

English Dataset

Pre-Processing

Feedback Collection&

Rule Generation

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Post-Processing

Sentence Program

English Dataset

Pre-Processing

Feedback Collection&

Rule Generation

- Detokenize punctuation- Replace NUMBER with actual values- Lower case all parameter values…

- Replace verbs with their imperative form- Insert missing prepositions- Replace translated parameter values with real values from target language…

Post-processing rules

Pre-processing rules

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Post-Processing

Sentence Program

English Dataset

Pre-Processing

Feedback Collection&

Rule Generation

- Detokenize punctuation- Replace NUMBER with actual values- Lower case all parameter values…

- Replace verbs with their imperative form- Insert missing prepositions- Replace translated parameter values with real values from target language…

Post-processing rules

Pre-processing rules

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Post-Processing

Sentence Program

English Dataset

Pre-Processing

Feedback Collection&

Rule Generation

- Detokenize punctuation- Replace NUMBER with actual values- Lower case all parameter values…

- Replace verbs with their imperative form- Insert missing prepositions- Replace translated parameter values with real values from target language…

Parameter MatchingPost-processing rules

Pre-processing rules

Data Collection method

Neural Machine Translation Model

(e.g. Google Translate)

display all review descriptions authored by Jennifer .

now => [description] of @restaurant.review, author == " Jennifer ") => notify

muestra todas las descripciones de las reseñas creadas por " Jennifer ".

Post-Processing

muestra todas las descripciones de las reseñas escritas por juan .

now => [description] of @restaurant.review, author == " juan ") => notify

Sentence Program

Dataset intarget language

English Dataset

Pre-Processing

Feedback Collection&

Rule Generation

- Detokenize punctuation- Replace NUMBER with actual values- Lower case all parameter values…

- Replace verbs with their imperative form- Insert missing prepositions- Replace translated parameter values with real values from target language…

Parameter MatchingPost-processing rules

Pre-processing rules

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

Naive Training

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Encoder

Decoder

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

We are not using the “knowledge” that these sentences are semantically equivalent

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Batching

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Batching

Encoder

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Batching

Encoder

Decoder

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

Batching

Encoder Loss

Encoder

Decoder

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Batching

Encoder Loss

Encoder

Decoder

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

Batching

Encoder Loss

Encoder

Decoder

Training with sentence batching

display all review descriptions authored by Jennifer .

muestra todas las descripciones de las reseñas creadas por Jennifer .

显示Jennifer撰写的所有评论描述。

now => [description] of @restaurant.review, author == " Jennifer ") => notify

Decoder Loss

Batching

Encoder Loss

Encoder

Decoder

We now use both losses to guide the training

Experiment results (Farsi)

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

Translated Verified New Params Test

Exact Match Accuracy

Challenges

Challenges

• Google translate is not perfect

• Identifying Language specific traits (single/ plural, missing prepositions, ...)

• Closing the gap between evaluation accuracy and test (real data) accuracy

• Automating and improving collection of natural parameter values for each language

• ...

Challenges

• Google translate is not perfect

• Identifying Language specific traits (single/ plural, missing prepositions, ...)

• Closing the gap between evaluation accuracy and test (real data) accuracy

• Automating and improving collection of natural parameter values for each language

• ...

Bonus:• Started code is available free of charge!

• 18/6 project technical support

• Optional happy hours to celebrate our results

• Will be featured as a contributor in our EMNLP paper

Recommended