Upload
akiva-miura
View
335
Download
0
Embed Size (px)
Citation preview
Tree-‐Based Machine Transla0on
Synchronous Context-‐Free Grammar
Introduced by Akiva Miura, AHC-‐Lab 2015/06/18
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 1
MT Study Group
Contents
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 2
6.2 Synchronous Context-‐Free Grammar 6.2.1 Characteris0cs 6.2.2 Training 6.2.3 Syntac0c Labels 6.2.4 Features 6.2.5 Decoding
SCFG
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 3
Synchronous Context-‐Free Grammar (SCFG): • bilingual extension of CFG
• can be applied for machine transla0on by source language side parsing (transducing)
Formalism
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 4
SCFG is defined as: where:
�� G���� = N �����R�A
N
ΣΔR
A
Rewrite Rules
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 5
�
* � → α β φ ∈R
,
α ( ) 1* 1 1
β ( ) 1* 1φ : 1 1 1* 1 : α β
� → α β φ � → α β, 1 1 1 : α β
Rules Example
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 6
Example of rewrite rules: S → <NP1 が VP2, NP1 VP2> VP → <NP1 を V2, V2 NP1> VP → <PP1 V2, V2 PP1> VP → <NP1 V2, V2 NP1> PP → <NP1 の P2, P2 NP1> NP → <NP1 の NP2, NP2 of NP1>
V → <開けた, opened> |<座った,sat> P → <上に, on> NP → <犬, the dog> | <ドア, the door> | <本, the book> | <上に, the upper>
Deriva0on Example
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 7
Example of deriva0on: <S1, S1> ⇒ <NP2 が VP3, NP2 VP3>
⇒ <犬 が VP3, the dog VP3> ⇒ <犬 が NP4 を V5, the dog V5 NP4> ⇒ <犬 が ドア を V5, the dog V5 the door> ⇒ <犬 が ドア を 開けた, the dog opened the door>
Parse Tree Example
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 8
Example of deriva0on trees:
犬
NP2 が NP3
S1
NP4 を V5
ドア 開けた
the dog
NP2 VP3
S1
V5 NP4
opened the door
Contents
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 9
6.2 Synchronous Context-‐Free Grammar 6.2.1 Characteris0cs 6.2.2 Training 6.2.3 Syntac0c Labels 6.2.4 Features 6.2.5 Decoding
Normal Form
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 10
• SCFG has almost the same characteris0cs with CFG, but does not have normal form
Explana0on: rank : # of non-‐terminals in the right part of rule binariza0on : conversion of rules with rank >= 3 to rules with rank <= 2
Any CFG can be converted to Chomsky Normal Form, but SCFG can’t
Binariza0on of Rank-‐3 Rules
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 11
• Any Rank-‐3 SCFG rule can be binarized:
e.g. X → <A1 B2 C3, C3 B2 A1>
introducing new non-‐terminal X’ X → <X’ 1 C2, C2 X’ 1> X’ → <A1 B2, B2 A1>
Binariza0on of Rank-‐4 Rules
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 12
• Not all rank-‐4 SCFG rules can be binarized: e.g. X → <A1 B2 C3 D4, C3 A1 D4 B2> X → <A1 B2 C3 D4, B2 D4 A1 C3>
A1
X
B2 C3 D4
C3 A1 D4 B2
X
A1
X
B2 C3 D4
B2 D4 A1 C3
X
these are called “inside-out”
Rela0on of Grammar Ranks
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 13
• r-‐CFG is set of languages produced by rank-‐r rules • Any r-‐CFG can be converted to equivalent 2-‐CFG
Ø 1-‐CFG ⊊ 2-‐CFG = 3-‐CFG = 4-‐CFG = … = r-‐CFG
• r-‐SCFG is set of language pairs produced by rank-‐r rules • 3-‐SCFG can be converted to equivalent 2-‐SCFG • r-‐SCFG (r ≧ 4) can not be banarized
Ø 1-‐SCFG ⊊ 2-‐SCFG = 3-‐SCFG ⊊ 4-‐SCFG ⊊ … ⊊ r-‐SCFG
Contents
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 14
6.2 Synchronous Context-‐Free Grammar 6.2.1 Characteris0cs 6.2.2 Training 6.2.3 Syntac0c Labels 6.2.4 Features 6.2.5 Decoding
Training
15/06/18
Automa0c training of synchronous rules:
彼1 は2
近 い3
う ち4 に5
国 会6 を7
解 散8
す る9
he1 ■
will2 disolve3 ■ ■
the4 ■
diet5 ■
in6 ■
the7 near8 ■ ■
future9 ■ ■
Word Alignment
近 い3
う ち4 に5
国 会6 を7
解 散8
す る9
disolve3 ■ ■
the4 ■
diet5 ■
in6 ■
the7 near8 ■ ■
future9 ■ ■
X1 に5 X2 解 散8
す る9
dissolve3 ■ ■
X2 ■
in6 ■
the7 X1 ■
Phrase Extraction ↑
Synchronous Rule Extraction →
Rule Extrac0on
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 16
These rules are extracted hierarchically, then called “Hierarchical Phrases/Rules” (Hiero)
�
.21 2,. 1 � � 1 1
� .,1 1.1. R
� � �←∅
2 . � � ∈ Φ� � �
R� � �←R
� � �∪ � → � �{ }
2 1: � → α β . . � � ∈ Φ� � �
α = α �α β = β �β
R� � �←R
� � �∪ � → α �α β �β{ }
2R = R
� � �� � � ∈ � � �∪
Rule Restric0on
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 17
• Hierarchical rule extraction method is exhaustive, then the trained grammar will be oversized and very ambiguous!
Ø need to limit the rules:• minimal phrase pairs for the same alignment• span length limitation (e.g. 2 ≦ length ≦ 10)• rule length limitation (e.g. length ≦ 5)• rank of rules (rank ≦ 2)• prohibition of contiguous non-‐‑‒terminals (X1 X2)• including at least 1 word alignment
Glue Rules
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 18
• Because of the span length limitation, the grammars might be impossible to cover long sentences.
Ø introducing heuristically initial synchronous rules called “gleu rules”:S → <S1 X2, S1 X2>S → <X1, X1>
• for long distance reordering (such as En↔Ja),we can introduce also:S → <S1 X2, X2 S1>
Contents
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 19
6.2 Synchronous Context-‐Free Grammar 6.2.1 Characteris0cs 6.2.2 Training 6.2.3 Syntac0c Labels 6.2.4 Features 6.2.5 Decoding
Syntac0c Labels
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 20
• In standard Hiero rules, using only 2 non-‐terminals: S, X • s0ll very ambiguous (might be slow and inaccurate)
Ø introducing syntac0c labels from parse tree 近 い3
う ち4 に5
国 会6 を7
解 散8
す る9
disolve3 ■ ■
the4 ■
diet5 ■
in6 ■
the7 near8 ■ ■
future9 ■ ■
NP PP
NP
VP IN+DT
VP/PP
VP\VB
Contents
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 21
6.2 Synchronous Context-‐Free Grammar 6.2.1 Characteris0cs 6.2.2 Training 6.2.3 Syntac0c Labels 6.2.4 Features 6.2.5 Decoding
Features
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 22
• Decoding with SCFG also uses log linear model, and the features are almost the same with PBMT
• If phrase pairs include non-‐terminals, count of phrases is not 1 per occurrence, but normalized by number of matched rules
• Addi0onal penal0es: • rule count penalty: • glue rule count penalty:
� ��� =− �
�� ����� � =− � � ∈ �∧ � ∈R����{ }
Contents
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 23
6.2 Synchronous Context-‐Free Grammar 6.2.1 Characteris0cs 6.2.2 Training 6.2.3 Syntac0c Labels 6.2.4 Features 6.2.5 Decoding
Decoding
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 24
• SCFG decoding maximizes the viterbi deriva0on with linear combina0on of the features:
�
= () ,
�
= () ,
',* ω�� � ( )( )∑',* ω�� � ( )( )
∑
≈ () ,∈D G���� �� =� �� =
ω�� � ( )
Transla0on Forest
15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 25
• Example of decoding:
⽝犬0,1が1,2
本2,3
の3,4座った5,6
上に4,5
NP0,1 VP2,6
S0,6
PP2,5 NP2,5
NP2,3 P4,5
NP4,5V5,6the dog
sat
the upper
on
the book
NP0,1 V5,6
NP4,5P4,5
NP2,3
PP2,5of
NP2,5
S0,6
↑ Source language side syntax parsing
Target language side transla0on forest ↑