MT Study SCFG

Tree-‐Based Machine Transla0on

Synchronous Context-‐Free Grammar

Introduced by Akiva Miura, AHC-‐Lab 2015/06/18

15/06/18 2015©Akiva Miura AHC-‐Lab, IS, NAIST 1

MT Study Group

Contents


6.2 Synchronous Context-‐Free Grammar 6.2.1 Characteris0cs 6.2.2 Training 6.2.3 Syntac0c Labels 6.2.4 Features 6.2.5 Decoding

SCFG


Synchronous Context-‐Free Grammar (SCFG): •  bilingual extension of CFG

•  can be applied for machine transla0on by source language side parsing (transducing)

Formalism


SCFG is defined as: where:

�� G�� = N ��Σ�Δ�R�A

N

ΣΔR

A

Rewrite Rules


�

* � → α β φ ∈R

,

α ( ) 1* 1 1

β ( ) 1* 1φ : 1 1 1* 1 : α β

� → α β φ � → α β, 1 1 1 : α β

Rules Example


Example of rewrite rules: S → <NP1 が VP2, NP1 VP2> VP → <NP1 を V2, V2 NP1> VP → <PP1 V2, V2 PP1> VP → <NP1 V2, V2 NP1> PP → <NP1 の P2, P2 NP1> NP → <NP1 の NP2, NP2 of NP1>

V → <開けた, opened> |<座った,sat> P → <上に, on> NP → <犬, the dog> | <ドア, the door> | <本, the book> | <上に, the upper>

Deriva0on Example


Example of deriva0on: <S1, S1> ⇒ <NP2 が VP3, NP2 VP3>

⇒ <犬が VP3, the dog VP3> ⇒ <犬が NP4 を V5, the dog V5 NP4> ⇒ <犬がドアを V5, the dog V5 the door> ⇒ <犬がドアを開けた, the dog opened the door>

Parse Tree Example


Example of deriva0on trees:

犬

NP2 が NP3

S1

NP4 を V5

ドア開けた

the dog

NP2 VP3

S1

V5 NP4

opened the door

Contents



Normal Form


•  SCFG has almost the same characteris0cs with CFG, but does not have normal form

Explana0on: rank : # of non-‐terminals in the right part of rule binariza0on : conversion of rules with rank >= 3 to rules with rank <= 2

Any CFG can be converted to Chomsky Normal Form, but SCFG can’t

Binariza0on of Rank-‐3 Rules


•  Any Rank-‐3 SCFG rule can be binarized:

e.g. X → <A1 B2 C3, C3 B2 A1>

introducing new non-‐terminal X’ X → <X’ 1 C2, C2 X’ 1> X’ → <A1 B2, B2 A1>

Binariza0on of Rank-‐4 Rules


•  Not all rank-‐4 SCFG rules can be binarized: e.g. X → <A1 B2 C3 D4, C3 A1 D4 B2> X → <A1 B2 C3 D4, B2 D4 A1 C3>

A1

X

B2 C3 D4

C3 A1 D4 B2

X

A1

X

B2 C3 D4

B2 D4 A1 C3

X

these are called “inside-out”

Rela0on of Grammar Ranks


•  r-‐CFG is set of languages produced by rank-‐r rules •  Any r-‐CFG can be converted to equivalent 2-‐CFG

Ø 1-‐CFG ⊊ 2-‐CFG = 3-‐CFG = 4-‐CFG = … = r-‐CFG

•  r-‐SCFG is set of language pairs produced by rank-‐r rules •  3-‐SCFG can be converted to equivalent 2-‐SCFG •  r-‐SCFG (r ≧ 4) can not be banarized

Ø 1-‐SCFG ⊊ 2-‐SCFG = 3-‐SCFG ⊊ 4-‐SCFG ⊊ … ⊊ r-‐SCFG

Contents



Training

15/06/18

Automa0c training of synchronous rules:

彼1 は2

近い3

うち4 に5

国会6 を7

解散8

する9

he1 ■

will2 disolve3 ■ ■

the4 ■

diet5 ■

in6 ■

the7 near8 ■ ■

future9 ■ ■

Word Alignment

近い3

うち4 に5

国会6 を7

解散8

する9

disolve3 ■ ■

the4 ■

diet5 ■

in6 ■

the7 near8 ■ ■

future9 ■ ■

X1 に5 X2 解散8

する9

dissolve3 ■ ■

X2 ■

in6 ■

the7 X1 ■

Phrase Extraction ↑

Synchronous Rule Extraction →

Rule Extrac0on


These rules are extracted hierarchically, then called “Hierarchical Phrases/Rules” (Hiero)

�

.21 2,. 1 � � 1 1

� .,1 1.1. R

� � �←∅

2 . � � ∈ Φ� � �

R� � �←R

� � �∪ � → � �{ }

2 1: � → α β . . � � ∈ Φ� � �

α = α �α β = β �β

R� � �←R

� � �∪ � → α �α β �β{ }

2R = R

� � �� ∈ � � �∪

Rule Restric0on


•  Hierarchical rule extraction method is exhaustive, then the trained grammar will be oversized and very ambiguous!

Ø  need to limit the rules:•  minimal phrase pairs for the same alignment•  span length limitation (e.g. 2 ≦ length ≦ 10)•  rule length limitation (e.g. length ≦ 5)•  rank of rules (rank ≦ 2)•  prohibition of contiguous non-‐‑‒terminals (X1 X2)•  including at least 1 word alignment

Glue Rules


•  Because of the span length limitation, the grammars might be impossible to cover long sentences.

Ø  introducing heuristically initial synchronous rules called “gleu rules”:S → <S1 X2, S1 X2>S → <X1, X1>

•  for long distance reordering (such as En↔Ja),we can introduce also:S → <S1 X2, X2 S1>

Contents



Syntac0c Labels


•  In standard Hiero rules, using only 2 non-‐terminals: S, X •  s0ll very ambiguous (might be slow and inaccurate)

Ø  introducing syntac0c labels from parse tree 近い3

うち4 に5

国会6 を7

解散8

する9

disolve3 ■ ■

the4 ■

diet5 ■

in6 ■

the7 near8 ■ ■

future9 ■ ■

NP PP

NP

VP IN+DT

VP/PP

VP\VB

Contents



Features


•  Decoding with SCFG also uses log linear model, and the features are almost the same with PBMT

•  If phrase pairs include non-‐terminals, count of phrases is not 1 per occurrence, but normalized by number of matched rules

•  Addi0onal penal0es: •  rule count penalty: •  glue rule count penalty:

� �� =− �

�� =− � � ∈ �∧ � ∈R��{ }

Contents



Decoding


•  SCFG decoding maximizes the viterbi deriva0on with linear combina0on of the features:

�

= () ,

�

= () ,

',* ω�� ( )( )∑',* ω�� ( )( )

∑

≈ () ,∈D G�� =� �� =

ω�� ( )

Transla0on Forest


•  Example of decoding:

⽝犬0,1が1,2

本2,3

の3,4座った5,6

上に4,5

NP0,1 VP2,6

S0,6

PP2,5 NP2,5

NP2,3 P4,5

NP4,5V5,6the dog

sat

the upper

on

the book

NP0,1 V5,6

NP4,5P4,5

NP2,3

PP2,5of

NP2,5

S0,6

↑ Source language side syntax parsing

Target language side transla0on forest ↑

End Slide


Software

MT Study SCFG