View
53
Download
1
Category
Preview:
Citation preview
Left-to-Right Hierarchical Phrase-based Translation System(LR-Hiero)
Maryam Siahbani
Overview
• History of Machine Translation• Rule based MT• Statistical MT– Training – Decoding
• Left-to-Right Hierarchical Phrase-based MT• Using LR-Hiero in Simultaneous Translation
2
History of Machine Translation
• Late 1940’s: Early rule-based systems– computers would replace human translations within 5
years!• 1966: ALPAC report cuts research funding• Early 1970’s: First commercial system (Systran)• Late 1980’s: IBM developed first statistical models
inspired by speech research• Late 2000’s: Explosion in MT research• 2006: First version of Google Translate
3
Rule-based Machine Translation
• Rules hand-written by linguists
• State of the art until early 2000’s– e.g. Systran
• Expensive to create maintain and adapt 4
FrenchNP
Nounchat
Adjectivenoir
EnglishNP
Nouncat
Adjectiveblack
Statistical Machine Translation
• Data driven approaches to MT• Learn translation from textual data– Parallel Data
• Language independent • Normally use probabilistic models – The best translation = the most probable translation where f: source sentence
• State of the art for most language pairs– Best systems include rules (hybrid)
5
translationmodel
Statistical Machine Translation
6
Training Pipeline
Training dataMonolingual & Bilingual data
Decoder
Input sentence
translation
Translation Data
Parallel Text:(Web, United Nations, European/Canadian Parliament, Wikipedia, etc.)
Statistical Machine Translation (SMT)
8
Aligned Words
EnZhhappens
发生 事情我们十分 关注 的we are very much concerned with what in region
地区非洲African
Learn alignment from parallel text
Statistical Machine Translation (SMT)
9
Aligned Words
EnZh
Translation rules
happens
发生 事情我们十分 关注 的we are very much concerned with what in region
地区非洲African
Learn alignment from parallel text
Id Source Target Weight
r1 关注 X_1 concerned with X_1 -5.3
r2 X_1 发生 X_2 事情
what happens X_2 X_1 -4.8
r3 非洲 地区 African region -3.1Learn weighted translation rules from word aligned text
Translation Rules (phrase-pairs)
10
Source Target p(e|f)
den Vorschlag the proposal 0.6227den Vorschlag ‘s proposal 0.1068den Vorschlag a proposal 0.0341den Vorschlag the idea 0.0250den Vorschlag this proposal 0.0227den Vorschlag proposal 0.0205den Vorschlag of the proposal 0.0159den Vorschlag the proposals 0.0159
* German-English phrase table trained on Europarl
Millions of translation rules
Log probability -1.7986
translationmodel
Statistical Machine Translation (SMT)
11
drdyee
rhwfePe )(.maxarg)|(maxarg*)(
Aligned Words
EnZh
Translation rules
Decoder
happens
发生 事情我们十分 关注 的we are very much concerned with what in region
地区非洲African
Learn alignment from parallel text
Id Source Target Weight
r1 关注 X_1 concerned with X_1 -5.3
r2 X_1 发生 X_2 事情
what happens X_2 X_1 -4.8
r3 非洲 地区 African region -3.1Learn weighted translation rules from word aligned text
Decoder generates many candidate translations, scores them and returns the most likely one
Find the translation for any given input (f)
f e
Measuring Translation Quality: BLEU score
• BLEU is a simple but effective scoring metric shown to be proportional to human judgment of translation quality
• The idea is to measure overlap between the translation generated by MT system and the reference translation
• Measure one word overlaps, two word overlaps,… (n-grams)
• Compute precision score for each n-gram• Impose a brevity penalty for candidates that are shorter
than reference
12
Measuring Translation Quality: BLEU score
• Input:– Ich war in meinen zwangzigern bevor ich erstmals in ein
kunstmuseum ging .• Reference translation:– I was in my twenties before I ever went to an art museum .
• Low BLEU score (41.1):– I was twenty I ever went to art .
• High BLEU score (89.0):– I was in my twenties before I first went to an art museum .
13
Hierarchical Phrase-based Translation (Hiero)
SCFG
Hierarchical Phrase-based Translation
Synchronous Context-Free Grammar
15
Aligned Words
EnZh
Translation Rules
X -> < 我们十分 X_1 / we are very much X_1>
X -> < 事情 / what >
我们 十分 关注 发生 的 事情 地区非洲
(Hiero)
X -> < 非洲 地区 / african region >
we are very much
X-> < 关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1>
concerned with happens inwhat african region
X -> < 我们十分 X_1 / we are very much X_1>X-> < 关注 X_1 发生 的 X_2 /concerned with X_2 happens in X_1>X -> < 事情 / what >X -> < 非洲 地区 / african region >
translationmodel
Decoder
Hiero Decoder
O(n^3)LM computation
我们 关注 发生 的 事情 地区十分 非洲 。
we are very much concerned with what happens in african regions .
X_2
X_1 X_2= what
X -> < 关注 X_1 发生 的 X_2 / concerned with X_2 happens in X_1>
X_1= african region
concerned with happens in
what african region
LM LM LM
Bottom-up Dynamic Programing algorithm
we are very much concerned with
16
Left-to-Right Hierarchical Phrase-based Translation System
Left-to-Right Target Generation (Watanabe et al. 2006)
18
X1
X1
X1
we are very much
concerned with
X2what happens X1
in african region
X1
X1
X1
我们十分关注
X2发生X1
的非洲 地区发生
的我们 关注 发生 事情 地区十分 非洲
we are very much concerned with what happens african regionin
X -> < 我们十分 X_1 / we are very much X_1>
X -> <X_1 发生 X_2 事情 / what happens X_2 X_1>
X -> < 关注 X_1 / concerned with X_1>
X -> <X_1 发生 的 X_2 / X_2 happens in X_1>Non-GNF
Greibach Normal Form (GNF)
• Search for sub-phrases within larger ones– Smaller phrases are replaced by non-terminal X
• Dynamic programming algorithm to extract rules for LR-– Linear time complexity (in number of rules)
LR-Hiero Rule Extraction
19
< 我们十分 X_1 / we are very much X_1>
事情
happens
发生我们十分 关注 的
we are very much concerned with what in region
地区非洲
AfricanX_1
X_1
• Search for sub-phrases within larger ones– Smaller phrases are replaced by non-terminal X
• A novel Dynamic programming algorithm to extract rules for LR-Hiero– Linear time complexity vs. exhaustive search
LR-Hiero Rule Extraction
20
< 我们十分 X_1 / we are very much X_1>
事情
happens
发生我们十分 关注 的
we are very much concerned with what in region
地区非洲
African
X2X_1
< X_1 发生 X_2 事情 / what happens X_2 X_1>
X2 X_1
• Linear time complexity vs. exhaustive search• Can easily extract rules with more non-terminals
LR-Hiero Rule Extraction
21
1 2 3 40
500100015002000250030003500
Effect of No. of Non-terminals on extraction time
Hiero HeuristicDP Extractor
No. of Non-terminals
Tim
e (s
ec.)
Expressive Hierarchical Rule Extraction for Left-to-Right Translation. M. Siahbani and A. Sarkar. AMTA(2014)
的Left-to-Right Decoding
X -> < 我们十分 X_1 / we are very much X_1>
X -> <X_1 发生 X_2 事情 / what happens X_2 X_1>
X -> < 非洲 地区 / African region >
<s> [0,8]<s> <s> we are very much<s> we are very much concerned with<s> we are very much concerned with what happens
<s> we are very much concerned with what happens in
0 1 2 3 4 5 6 7 8 我们 关注 发生 事情 地区十分 非洲
X -> < 关注 X_1 / concerned with X_1>
X -> < 的 / in >
we are very much[2,8]concerned with[3,8]what happens[6,7] [3,5]
in
[3,5]African region
22
的Left-to-Right Decoding
<s> [0,8]<s> we are very much [2,8] <s> we are very much concerned with [3,8] <s> we are very much concerned with what happens [6,7][3.5] <s> we are very much concerned with what happens in [3,5]<s> we are very much concerned with what happens in African region
0 1 2 3 4 5 6 7 8 我们 关注 发生 事情 地区十分 非洲
Typical CKY: 23
drdyt
rfwt )(.maxarg*)(
Candidate translations are scored by:
< 我们十分 X_1 / we are very much X_1>, -4.7
<X_1 发生 X_2 事情 / what happens X_2 X_1>, -3.6
< 非洲 地区 / African region >, -2.7
< 关注 X_1 / concerned with X_1>, -3.8
< 的 / in >, -1.2
, -7.7, -7.1
, -5.9, -4.5
, -3.3, 0
LR-Hiero State-of-the-art
1000 2000 3000 4000 5000 6000 7000 800017
19
21
23
25
27
29
Czech-EnglishGerman-EnglishChinese-English
LM Calls (translation time)
BLEU
(tra
nsla
tion
accu
racy
)LR-Hiero Results
3 Times FasterComparable Translation Accuracy
Statistical Machine Translation (SMT)
• Available SMT systems:– Moses (Edinburgh)– Phrasal (Stanford)– Jane 2 (Aachen University)– Joshua (JHU)– Kriya (SFU)– CDEC (CMU)– LR-Hiero
Phrase-Based
Hierarchical Phrase-Based(Hiero)
Left-to-Right Hierarchical Phrase-based
Available : https://github.com/sfu-natlang/lrhiero
• Time efficient • Can model complex translation• Generates translation in left-to-right
manner• Suitable choice for online translation
Simultaneous Translation
Speech to Speech Translation
Karlsruhe (KIT) Lecture Translator
NICT Speech Translator Skype Translator
Incremental Translation
• Facilitate continuous translation with low latency– Latency: time difference between start of source
sentence (speech) and start of target sentence (speech)
• Ensure acceptable translation accuracy
Good evening, I would like a taxi to the airport please
Buenas noches. Quiero untaxi al aeropuerto por favor
6 sec
Good evening, I would 0.7 sec
0.2 sec
0.2 seclike a taxi
to the airport please
Non-incremental
Buenas noches quiero
como un taxi
al aeropuerto por favor
Incremental
translate
segment?
Good
Integrating Segmentation with Translation Process
segment?
Goodevening translate
Integrating Segmentation with Translation Process
Integrating Segmentation with Translation Process
segment?
Good eveningI Buenas nochestranslate
Incremental Translation Results
Translation accuracy measure
• Task: English-German TED speech translation• MT System Training Data: IWSLT 2013 Train data +
Europarl v7 data [Koehn 2005]
Bleu Latency (sec) Segs/SecondNon-incremental 21.08 6.353 0.15Prosodic 20.88 0.468 2.27Incremental 20.86 0.311 3.22
Publications
33
• Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. Siahbani, Maryam and Sankaran, Baskaran and Sarkar, Anoop. EMNLP(2014)
• Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation. Siahbani, Maryam and Sarkar, Anoop. EMNLP(2014)
• Expressive Hierarchical Rule Extraction for Left-to-Right Translation. Siahbani, Maryam and Sarkar, Anoop. AMTA(2014)
• Incremental Translation using a Hierarchical Phrase-based Translation System. Siahbani, Maryam and Mehdizadeh Seraj, Ramtin and Sankaran, Baskaran and Sarkar, Anoop. SLT (2014)complexity (in number of rules)
Question?
Partial Hypothesis
<s> [0,8], -3.3
<s> we are very much [2,8], -4.5
的0 1 2 3 4 5 6 7 8 我们 关注 发生 事情 地区十分 非洲
<s> we are very much concerned with [3,8], -5.9
<s> we are very much concerned with what happens [6,7][3,5], -7.1
LR-Decoding with Beam Search• LR-Decoding integrated with beam-search (Watanabe
et al. 2006)• Stacks: hypotheses with same number of source side
words covered• Exhaustively generating all possible partial
hypotheses for a given stack
36
Cube pruning• Each cube: a group of hypotheses and applicable
rules • Cubes are fed to a priority queue which fills the
current stack
37
• Rows: hypotheses• Columns: rules• Rows and columns are sorted based on the scores• Assumption: The best hypothesis is in the top left– The next best are the neighbours of this entry
Cube pruning
38
12.5 12.4 14.3
12.6 12.8 14.7
13.3 13.5 15.4
0.9 1.1 3.2
students have not yet 10.2 12.512.512.412.4
mad
e
done
do
pupils have not yet 11.5
student has not 12.7
Time Efficiency: avg of LM queries
Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 39
Watanabe et al. (2006)
Reordering Features
• LR-Hiero by (Watanabe et al. 2006) achieves ~2 BLEU scores less than Hiero
40
Watanabe et al. (2006)
Reordering Features
• Distortion feature (when apply each rule)
• Number of reordering rules (non-terminals on source and target side are reordered)
41
r<>= 1r<>= 0
<X_1 发生 X_2 事情 / what happens X_2 X_1>
<X_1 发生 X_2 事情 / what happens X_1 X_2><X_1 发生 X_2 事情 / what happens X_2 X_1>
的0 1 2 3 4 5 6 7 8 我们 关注 发生 事情 地区十分 非洲
d = (5-3) + (7-6) + (8-6) + (7-3) + (8-5)
Translation Quality
Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering. M. Siahbani, B. Sankaran and A. Sarkar. EMNLP(2013) 42
Watanabe et al. (2006)
Search Error in Cube Pruning
43
8.1 8.2 8.5
8.0 8.4 8.6
8.3 8.9 8.8
0.9 1.3 3.2
6.6
6.7
6.9
9.1 8.9 9.3
8.0 8.5 9.0
7.7 7.9 8.1
1.0 1.3 1.5
6.2
6.3
6.5
8.1
8.0 8.18.08.28.2
• Assumption: The best hypothesis is in the top left– The next best are the neighbours of this entry
• Adding LM score violates the assumption
Search Error in Cube Pruning
44
• Assumption: The best hypothesis is in the top left– The next best are the neighbours of this entry
• Adding LM score violates the assumption
8.1 8.2 8.5
8.0 8.4 8.6
8.3 8.9 8.8
0.9 1.3 3.2
6.6
6.7
6.9
9.1 8.9 9.3
8.0 8.5 9.0
7.7 7.9 8.1
1.0 1.3 1.5
6.2
6.3
6.5
8.08.0 8.08.07.7
7.7
Queue diversity
Queue Diversity
Two Improvements to Left-to-Right Decoding for Hierarchical Phrase-based Machine Translation. M. Siahbani and A. Sarkar. EMNLP(2014) 45
Chinese-English23.5
24
24.5
25
25.5
26
26.5
BLEU score
LR-HieroLR-Hiero+CPLR-Hiero+CP (QD=10)
Chinese-English0
500010000150002000025000300003500040000
No. LM calls
LR-HieroLR-Hiero+CPLR-Hiero+CP (QD=10)
Lexicalized Reordering Model
• Distortion penalty is weak– deviation from the monotonic translation
• Learn reordering preferences for each phrase (respect to previous phrase)– Monotone– Swap– Discontinuous
46
F
EFigure from "Statistical Machine Translation“ Koehn 2010
Lexicalized Reordering Model
• Collect orientation information during rule extraction– Convert each rule to a phrase-pair (possibly discontinuous)– M: If there is a phrase-pair on the top-left– S: If there is a phrase-pair on the top right– D: otherwise
• Estimation by relative frequency
47
F
E
Figure from "Statistical Machine Translation“ Koehn 2010
Recommended