Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech...

Left-to-Right Hierarchical Phrase-based Translation System(LR-Hiero)

Maryam Siahbani

Overview

• History of Machine Translation• Rule based MT• Statistical MT– Training – Decoding

• Left-to-Right Hierarchical Phrase-based MT• Using LR-Hiero in Simultaneous Translation

History of Machine Translation

• Late 1940’s: Early rule-based systems– computers would replace human translations within 5

years!• 1966: ALPAC report cuts research funding• Early 1970’s: First commercial system (Systran)• Late 1980’s: IBM developed first statistical models

inspired by speech research• Late 2000’s: Explosion in MT research• 2006: First version of Google Translate

Rule-based Machine Translation

• Rules hand-written by linguists

• State of the art until early 2000’s– e.g. Systran

• Expensive to create maintain and adapt 4

FrenchNP

Nounchat

Adjectivenoir

EnglishNP

Nouncat

Adjectiveblack

Statistical Machine Translation

• Data driven approaches to MT• Learn translation from textual data– Parallel Data

• Language independent • Normally use probabilistic models – The best translation = the most probable translation where f: source sentence

• State of the art for most language pairs– Best systems include rules (hybrid)

translationmodel

Statistical Machine Translation

Training Pipeline

Training dataMonolingual & Bilingual data

Decoder

Input sentence

translation

Translation Data

Parallel Text:(Web, United Nations, European/Canadian Parliament, Wikipedia, etc.)

Statistical Machine Translation (SMT)

Aligned Words

EnZhhappens

发生事情我们十分关注的we are very much concerned with what in region

地区非洲African

Learn alignment from parallel text

Aligned Words

Translation rules

happens

地区非洲African

Id Source Target Weight

r1 关注 X_1 concerned with X_1 -5.3

r2 X_1 发生 X_2 事情

what happens X_2 X_1 -4.8

r3 非洲地区 African region -3.1Learn weighted translation rules from word aligned text

Translation Rules (phrase-pairs)

Source Target p(e|f)

den Vorschlag the proposal 0.6227den Vorschlag ‘s proposal 0.1068den Vorschlag a proposal 0.0341den Vorschlag the idea 0.0250den Vorschlag this proposal 0.0227den Vorschlag proposal 0.0205den Vorschlag of the proposal 0.0159den Vorschlag the proposals 0.0159

* German-English phrase table trained on Europarl

Millions of translation rules

Log probability -1.7986

translationmodel

drdyee

rhwfePe )(.maxarg)|(maxarg*)(

Aligned Words

Translation rules

Decoder

happens

地区非洲African

Id Source Target Weight

r1 关注 X_1 concerned with X_1 -5.3

r2 X_1 发生 X_2 事情

what happens X_2 X_1 -4.8

r3 非洲地区 African region -3.1Learn weighted translation rules from word aligned text

Decoder generates many candidate translations, scores them and returns the most likely one

Find the translation for any given input (f)

Measuring Translation Quality: BLEU score

• BLEU is a simple but effective scoring metric shown to be proportional to human judgment of translation quality

• The idea is to measure overlap between the translation generated by MT system and the reference translation

• Measure one word overlaps, two word overlaps,… (n-grams)

• Compute precision score for each n-gram• Impose a brevity penalty for candidates that are shorter

than reference

Measuring Translation Quality: BLEU score

• Input:– Ich war in meinen zwangzigern bevor ich erstmals in ein

kunstmuseum ging .• Reference translation:– I was in my twenties before I ever went to an art museum .

• Low BLEU score (41.1):– I was twenty I ever went to art .

• High BLEU score (89.0):– I was in my twenties before I first went to an art museum .

Hierarchical Phrase-based Translation (Hiero)

Hierarchical Phrase-based Translation

Synchronous Context-Free Grammar

Aligned Words

Translation Rules

X -> < 我们十分 X_1 / we are very much X_1>

X -> < 事情 / what >

我们十分关注发生的事情地区非洲

(Hiero)

X -> < 非洲地区 / african region >

we are very much

X-> < 关注 X_1 发生的 X_2 /concerned with X_2 happens in X_1>

concerned with happens inwhat african region

X -> < 我们十分 X_1 / we are very much X_1>X-> < 关注 X_1 发生的 X_2 /concerned with X_2 happens in X_1>X -> < 事情 / what >X -> < 非洲地区 / african region >

translationmodel

Decoder

Hiero Decoder

O(n^3)LM computation

我们关注发生的事情地区十分非洲。

we are very much concerned with what happens in african regions .

X_1 X_2= what

X -> < 关注 X_1 发生的 X_2 / concerned with X_2 happens in X_1>

X_1= african region

concerned with happens in

what african region

LM LM LM

Bottom-up Dynamic Programing algorithm

we are very much concerned with

Left-to-Right Hierarchical Phrase-based Translation System

Left-to-Right Target Generation (Watanabe et al. 2006)

we are very much

concerned with

X2what happens X1

in african region

我们十分关注

X2发生X1

的非洲地区发生

的我们关注发生事情地区十分非洲

we are very much concerned with what happens african regionin

X -> <X_1 发生 X_2 事情 / what happens X_2 X_1>

X -> < 关注 X_1 / concerned with X_1>

X -> <X_1 发生的 X_2 / X_2 happens in X_1>Non-GNF

Greibach Normal Form (GNF)

• Search for sub-phrases within larger ones– Smaller phrases are replaced by non-terminal X

• Dynamic programming algorithm to extract rules for LR-– Linear time complexity (in number of rules)

LR-Hiero Rule Extraction

< 我们十分 X_1 / we are very much X_1>

事情

happens

发生我们十分关注的

we are very much concerned with what in region

地区非洲

AfricanX_1

• Search for sub-phrases within larger ones– Smaller phrases are replaced by non-terminal X

• A novel Dynamic programming algorithm to extract rules for LR-Hiero– Linear time complexity vs. exhaustive search

< 我们十分 X_1 / we are very much X_1>

事情

happens

发生我们十分关注的

we are very much concerned with what in region

地区非洲

African

< X_1 发生 X_2 事情 / what happens X_2 X_1>

X2 X_1

• Linear time complexity vs. exhaustive search• Can easily extract rules with more non-terminals

1 2 3 40

500100015002000250030003500

Effect of No. of Non-terminals on extraction time

Hiero HeuristicDP Extractor

No. of Non-terminals

Expressive Hierarchical Rule Extraction for Left-to-Right Translation. M. Siahbani and A. Sarkar. AMTA(2014)

的Left-to-Right Decoding

X -> <X_1 发生 X_2 事情 / what happens X_2 X_1>

X -> < 非洲地区 / African region >

<s> [0,8]<s> <s> we are very much<s> we are very much concerned with<s> we are very much concerned with what happens

<s> we are very much concerned with what happens in

0 1 2 3 4 5 6 7 8 我们关注发生事情地区十分非洲

X -> < 关注 X_1 / concerned with X_1>

X -> < 的 / in >

we are very much[2,8]concerned with[3,8]what happens[6,7] [3,5]

[3,5]African region

的Left-to-Right Decoding

<s> [0,8]<s> we are very much [2,8] <s> we are very much concerned with [3,8] <s> we are very much concerned with what happens [6,7][3.5] <s> we are very much concerned with what happens in [3,5]<s> we are very much concerned with what happens in African region

0 1 2 3 4 5 6 7 8 我们关注发生事情地区十分非洲

Typical CKY: 23

rfwt )(.maxarg*)(

Candidate translations are scored by:

< 我们十分 X_1 / we are very much X_1>, -4.7

<X_1 发生 X_2 事情 / what happens X_2 X_1>, -3.6

< 非洲地区 / African region >, -2.7

< 关注 X_1 / concerned with X_1>, -3.8

< 的 / in >, -1.2

, -7.7, -7.1

, -5.9, -4.5

, -3.3, 0

LR-Hiero State-of-the-art

1000 2000 3000 4000 5000 6000 7000 800017

Czech-EnglishGerman-EnglishChinese-English

LM Calls (translation time)

)LR-Hiero Results

3 Times FasterComparable Translation Accuracy

• Available SMT systems:– Moses (Edinburgh)– Phrasal (Stanford)– Jane 2 (Aachen University)– Joshua (JHU)– Kriya (SFU)– CDEC (CMU)– LR-Hiero

Phrase-Based

Hierarchical Phrase-Based(Hiero)

Left-to-Right Hierarchical Phrase-based

Available : https://github.com/sfu-natlang/lrhiero

• Time efficient • Can model complex translation• Generates translation in left-to-right

manner• Suitable choice for online translation

Simultaneous Translation

Speech to Speech Translation

Karlsruhe (KIT) Lecture Translator

NICT Speech Translator Skype Translator

Incremental Translation

• Facilitate continuous translation with low latency– Latency: time difference between start of source

sentence (speech) and start of target sentence (speech)

• Ensure acceptable translation accuracy

Good evening, I would like a taxi to the airport please

Buenas noches. Quiero untaxi al aeropuerto por favor

Good evening, I would 0.7 sec

0.2 sec

0.2 seclike a taxi

to the airport please

Non-incremental

Buenas noches quiero

como un taxi

al aeropuerto por favor

Incremental

translate

segment?

Integrating Segmentation with Translation Process

segment?

Goodevening translate

Integrating Segmentation with Translation Process

segment?

Good eveningI Buenas nochestranslate

Incremental Translation Results

Translation accuracy measure

• Task: English-German TED speech translation• MT System Training Data: IWSLT 2013 Train data +

Europarl v7 data [Koehn 2005]

Bleu Latency (sec) Segs/SecondNon-incremental 21.08 6.353 0.15Prosodic 20.88 0.468 2.27Incremental 20.86 0.311 3.22

Publications

• Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. Siahbani, Maryam and Sankaran, Baskaran and Sarkar, Anoop. EMNLP(2014)

• Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation. Siahbani, Maryam and Sarkar, Anoop. EMNLP(2014)

• Expressive Hierarchical Rule Extraction for Left-to-Right Translation. Siahbani, Maryam and Sarkar, Anoop. AMTA(2014)

• Incremental Translation using a Hierarchical Phrase-based Translation System. Siahbani, Maryam and Mehdizadeh Seraj, Ramtin and Sankaran, Baskaran and Sarkar, Anoop. SLT (2014)complexity (in number of rules)

Question?

Partial Hypothesis

<s> [0,8], -3.3

<s> we are very much [2,8], -4.5

的0 1 2 3 4 5 6 7 8 我们关注发生事情地区十分非洲

<s> we are very much concerned with [3,8], -5.9

<s> we are very much concerned with what happens [6,7][3,5], -7.1

LR-Decoding with Beam Search• LR-Decoding integrated with beam-search (Watanabe

et al. 2006)• Stacks: hypotheses with same number of source side

words covered• Exhaustively generating all possible partial

hypotheses for a given stack

Cube pruning• Each cube: a group of hypotheses and applicable

rules • Cubes are fed to a priority queue which fills the

current stack

• Rows: hypotheses• Columns: rules• Rows and columns are sorted based on the scores• Assumption: The best hypothesis is in the top left– The next best are the neighbours of this entry

Cube pruning

12.5 12.4 14.3

12.6 12.8 14.7

13.3 13.5 15.4

0.9 1.1 3.2

students have not yet 10.2 12.512.512.412.4

pupils have not yet 11.5

student has not 12.7

Time Efficiency: avg of LM queries

Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 39

Watanabe et al. (2006)

Reordering Features

• LR-Hiero by (Watanabe et al. 2006) achieves ~2 BLEU scores less than Hiero

Reordering Features

• Distortion feature (when apply each rule)

• Number of reordering rules (non-terminals on source and target side are reordered)

r<>= 1r<>= 0

<X_1 发生 X_2 事情 / what happens X_2 X_1>

<X_1 发生 X_2 事情 / what happens X_1 X_2><X_1 发生 X_2 事情 / what happens X_2 X_1>

的0 1 2 3 4 5 6 7 8 我们关注发生事情地区十分非洲

d = (5-3) + (7-6) + (8-6) + (7-3) + (8-5)

Translation Quality

Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 42

Search Error in Cube Pruning

8.1 8.2 8.5

8.0 8.4 8.6

8.3 8.9 8.8

0.9 1.3 3.2

9.1 8.9 9.3

8.0 8.5 9.0

7.7 7.9 8.1

1.0 1.3 1.5

8.0 8.18.08.28.2

• Assumption: The best hypothesis is in the top left– The next best are the neighbours of this entry

• Adding LM score violates the assumption

Search Error in Cube Pruning

• Assumption: The best hypothesis is in the top left– The next best are the neighbours of this entry

• Adding LM score violates the assumption

8.1 8.2 8.5

8.0 8.4 8.6

8.3 8.9 8.8

0.9 1.3 3.2

9.1 8.9 9.3

8.0 8.5 9.0

7.7 7.9 8.1

1.0 1.3 1.5

8.08.0 8.08.07.7

Queue diversity

Queue Diversity

Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation. M. Siahbani and A. Sarkar. EMNLP(2014) 45

Chinese-English23.5

BLEU score

LR-HieroLR-Hiero+CPLR-Hiero+CP (QD=10)

Chinese-English0

500010000150002000025000300003500040000

No. LM calls

LR-HieroLR-Hiero+CPLR-Hiero+CP (QD=10)

Lexicalized Reordering Model

• Distortion penalty is weak– deviation from the monotonic translation

• Learn reordering preferences for each phrase (respect to previous phrase)– Monotone– Swap– Discontinuous

EFigure from "Statistical Machine Translation“ Koehn 2010

Lexicalized Reordering Model

• Collect orientation information during rule extraction– Convert each rule to a phrase-pair (possibly discontinuous)– M: If there is a phrase-pair on the top-left– S: If there is a phrase-pair on the top right– D: otherwise

• Estimation by relative frequency

Figure from "Statistical Machine Translation“ Koehn 2010

Left-to-Right Hierarchical Phrase-based Translation and its Application in Simultaneous Speech...

Technology

Portuguese - English Simultaneous Translation

Surat Maryam the noble Qur'an translation - القرآن الكريم

Maryam roshani

Maryam Nisar

Skill Transfer from Sight Translation to Simultaneous

Simultaneous Speech Translation for Live Subtitling: from

AI, now and future - wipo.int · 14 Baidu . OCR translation Conversation translation Text translation Machine Translation 15 Baidu . Simultaneous translation 16 Baidu . Reading comprehension

Maryam almarri

A System for Simultaneous Translation of Lectures and Speeches

Statistical Translation Language Model Maryam Karimzadehgan mkarimz2@illinois.edu University of Illinois at Urbana-Champaign 1

Simultaneous translation of lectures and speechesisl.anthropomatik.kit.edu/downloads/861_lt2007(1).pdf · 2012. 12. 5. · Simultaneous translation of lectures and speeches 211 1

Efï¬cient Translation of Rotavirus mRNA Requires Simultaneous

Constructing a Speech Translation System using ......Table 2: Translation and simultaneous interpretation data Data Lines Words(EN) Words(JA) Translation T1 167 3.11k 4.58k T2 4.64k

Maryam Poster

Maryam Mirzhakani

Teaching foreign languages through audiovisual translation ...shura.shu.ac.uk/18464/1/Alonso-Perez...Teaching Foreign Languages Through Audiovisual Translation ... Fansubbing Simultaneous

JOK Maryam

Automatic Simultaneous Translation a service developed by ... · 2009 the first commercial App for speech-to-speech translation on a mobile device became availa-ble the Interactive

Maryam - Thesis

Simultaneous transcription and translation in prokaryotes Green arrow = E. coli DNA Red arrow = mRNA combined with ribosomes