Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Computing Lattice BLEU Oracle Scoresfor Machine Translation
Artem Sokolov & Guillaume Wisniewski & Francois [email protected]
LIMSI, Orsay, France
1 Introduction2 Oracle Decoding Task3 Proposed Oracle Decoders4 Experiments5 Conclusion & Future Work
Model & Inference in SMT
Usual translating of sentences f:
n best translation: e∗ = arg maxe∈E(f) score(e|f)
n e.g.: score(e|f) = λ · h(e, f) h(e, f) – features, λ – params
n E (f) – search space áinterested in this
Example
n source: Venus est la jumelle infernale de la Terre
n search space options:
á list of n-best hypothesesVenus - the twin of hell of the Earth
Venus is the infernal binocular of the Earth
Venus is the hellish twin of the Earth
á word lattice
02Vénus : Venus
1Vénus_est : Venus 3
est : is
NULL : -
4
la : the 5
la_jumelle : twin
jumelle : twin
7infernale : hellish
6
infernale : infernal8
infernale : of_hell
jumelle : twinjumelle : binocular
9de_la_Terre : of_the_Earth
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 1 / 15
Motivation
n in reality SMT systems perform poorly
å knowing why is difficult because of system complexity
n useful to know the best possible hypotheses
å can help failure analysis
Usefulness of (oracle) translation
n Finding bottlenecks
á if oracle is good – why the system couldn’t return it?
á if oracle is bad – why doesn’t E contain better ones?
n Learning
á update towards the best hypothesis
In this work
n interested in finding best hypotheses contained in lattices
n two new methods, comparison with 2 existing methods
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 2 / 15
Oracle Decoding Task
n by best hypothesis in E we mean highest BLEU translation
n-BLEU(ei, ri) = BrevityPenalty ·( n∏
m=1
pm
)1/n
á ei – hypotheses, ri – references, pm – clipped precisions
n need some sentence-level approximation BLEU’
Exact oracle finding with sent-BLEU’
n in n-best list – straight-forward
n in word lattice – NP-hard even for 2-BLEU [Leusch et al., 2008]
å so, approximations are needed áfocus of this presentation
Oracle decoding on lattice L for reference rf
π∗(f) = arg maxπ∈paths on L
BLEU ′(eπ, rf).
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 3 / 15
Earlier Approaches
Language Model Oracle [Li & Khudanpur, 2009] LM
n n-gram language model ALM built on reference r
n find most probable hypothesis
π∗LM(f) = ShortestPath(L ALM(rf))
Partial BLEU Oracle [Dreyer et al., 2007] PB/PB`
n for partial path π let its weight be:
− logBLEU ′(π) = −1
4
∑m=1..4
log pm
n hypothesis h1 is prefered to h2 if logBLEU(h1) > logBLEU(h2)
á and same 3-word history or PBá same 3-word history and same length PB`
n find shortest path
π∗PB(f) = ShortestPath(L)
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 4 / 15
Problems
Problems with existing approaches
1 LM
á distant relation to BLEUá low quality
2 PB
á approximate search (recombination heuristics)á can be resource-greedy (keeps many hypotheses until the final state)
3 Both LM & PB ignore:
á precision clippingá brevity penalty (except PB`)á corpus nature of BLEU
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 5 / 15
Our Contribution
New oracles
n partially take clipping, brevity and corpus nature into account
n produce better hypotheses and often faster
Take-away message
n although true BLEU oracles are NP-hard...
n we can find approximate oracles:
n very easy to calculaten very high BLEU
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 6 / 15
Linear approximation of corpus-BLEU LB
Linear approximation of BLEU [Tromble et al., 2008]
linBLEU(π) = θ0 |eπ|+4∑
n=1
θn
∑u∈Σn
cu(eπ)δu(r)
n cu(e) – number of occurences of u in e,
n δu(r) – indicator of presence of u in r.
n parameters: θ0 = 1 (∼brevity), θn = − 14p·rn−1 (n-gram discounts)
‘de Bruijn transducers’ ∆n
n each ∆n contains all n-grams as output symbols
n arcs weighted by δu(r)
q₀
0:0/01:1/0
q₀
00:ε/0
1
1:ε/0
0:00/0
1:01/0
0:10/0
1:11/0
q₀
0
0:ε/0
1
1:ε/0
011:ε/0
00
0:ε/0
100:ε/0
11
1:ε/0
1:011/0
0:010/0
1:101/0
0:100/0
0:110/0
1:111/0
1:001/0
0:000/0
∆n automata for alphabet 0, 1 and n = 1 . . . 3
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 7 / 15
Linear approximation of corpus-BLEU LB
Composition effect of ∆s
n L ∆n produces n-gram lattice
n each n-gram arc weighted by θn, if the n-gram was in reference
Find LB oracle translation for 4-BLEU
n weight lattice L with θ0
n weight ∆i with θi
n find shortest path
π∗LB = ShortestPath(L ∆1 ∆2 ∆3 ∆4).
n so far all oracles:á transformed and/or reweighted lattice Lá called ShortestPath
n difficult to account for clipping (it depends on the resulting path)
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 8 / 15
Integer Linear Programming Oracles ILP
n how to take clipping into account?
n need to fine-tune path weights
n reformulate 1-BLEU shortest path problem as an ILP problem:
arg maxξ
∑i∈arcs
ξi · (Θ1 · 1i∈r −Θ2 · 1i 6∈r) maximize 1-BLEU
∑ξ∈in(qfin)
ξ = 1,∑
ξ∈out(qinit )
ξ = 1 solution should be path
∑ξ∈out(q)
ξ −∑
ξ∈in(q)
ξ = 0, q ∈ nodes \ qinit , qfin
n arc indicator ξi =
1 if arc i is active in solution,
0 otherwise,
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 9 / 15
Taking account of clipping in ILP
New variable: Clipped word counter
γw = min∑
ξ∈edges with w
ξ, # of w in r
ILP problem with clipping
arg maxξ,γw
Θ1 ·∑
w∈words
γw −Θ2 ·
∑i∈arcs
ξi −∑
w∈words
γw
γw ≥ 0, γw ≤ cw (r), γw ≤
∑ξ∈edges carring w
ξ
∑ξ∈in(qfin)
ξ = 1,∑
ξ∈out(qinit )
ξ = 1
∑ξ∈out(q)
ξ −∑ξ∈in(q)
ξ = 0, q ∈ nodes \ qinit , qfin
n many ILP solvers available n we used Gurobi
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 10 / 15
Langrangian Relaxation-based Oracle Decoder RLX
n can be done without 3rd-party solvers
n Lagrangian relaxation to deal with clipping
Primal Dual
arg minξ∈paths
−∑
i∈arcs
ξi · θi∑ξ with w
ξ ≤ cw (r),w ∈ r
arg maxλ
L(λ)
λw 0,w ∈ r
Where
L(λ) = minξ
− ∑i∈arcs
ξi · θi +∑w∈r
λw ·
∑ξ with w
ξ − cw (r)
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 11 / 15
Langrangian Relaxation-based Oracle Decoder RLX
L(λ) = minξ
− ∑i∈arcs
ξiθi +∑w∈r
λw
∑ξ with w
ξ − cw (r)
ξ∗ = arg min
ξL(λ) = arg min
ξ
∑i ∈ arcs
ξi
(λw(ξi ) − θi
)ájust find shortest path
∂L(λ)
∂λw=
∑ξ∈Ω(w)∩ξ∗
ξ − cw (r) áupdate λ towards maxL(λ)
Incremental Constraint Enforcing cf. [Rush et al., 2010]
∀w, λ(0)w ← 0
for t = 1→ T doξ∗(t) = arg minξ
∑i ξi ·
(λw(ξi) − θi
)if all clipping constraints are enforcedthen optimal solution foundelse for w ∈ r do
nw ← n. of occurrences of w in ξ∗(t)
λ(t)w ← λ
(t)w + α(t) · (nw − cw(r))
λ(t)w ← max(0, λ(t)
w )
n this was for 1-BLEU n for n-BLEU run on L ∆n
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 12 / 15
Experiments: Performance for two decoders
N-code
35
40
45
50
RLX ILP LB4 LB2 PB PBl LM4 LM2 0
1
2
3
4
5
6
BLE
U
avg.
tim
e, s
BLEU
47.8
2
48.1
2
48.2
2
47.7
1
46.7
6
46.4
8
38.9
1
38.7
5
avg. timen-best
(a) fr→en (test: 27.88)
30
35
RLX ILP LB4 LB2 PB PBl LM4LM2 0
0.5
1
1.5
BLE
U
avg.
tim
e, s
BLEU
34.7
9
34.7
0
35.4
9
35.0
9
34.8
5
34.7
6
29.5
3
29.5
3
avg. timen-best
(b) de→en (test: 22.05)
20
25
30
RLX ILP LB4 LB2 PB PBl LM4LM2 0
0.5
1
BLE
U
avg.
tim
e, s
BLEU
24.7
5
24.6
6
25.3
4
24.8
5
24.7
8
24.7
3
20.7
8
20.7
4
avg. timen-best
(c) en→de (test: 15.83)
Moses
35
40
45
50
RLX ILP LB4 LB2 PB PBl LM4 LM2 0
1
2
3
BLE
U
avg.
tim
e, s
BLEU
43.8
2
44.0
8
44.4
4
43.8
2
43.4
2
43.2
0
36.3
4
36.2
5
avg. timen-best
(d) fr→en (test: 27.68)
30
35
RLX ILP LB4 LB2 PB PBl LM4 LM2 0
1
2
3
4
BLE
U
avg.
tim
e, s
36.4
3
36.9
1
37.7
3
36.5
2
36.7
5
36.6
2
29.5
1
29.4
5
(e) de→en (test: 21.85)
20
25
30
RLX ILP LB4 LB2 PB PBl LM4 LM2 0
1
2
3
4
5
6
7
8
9
BLE
U
avg.
tim
e, s
28.6
8
28.6
4 29.9
4
28.9
4
28.7
6
28.6
5
21.2
9
21.2
3
(f) en→de (test: 15.89)
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 13 / 15
Conclusion & Future Work
Results
n new oracle decoders find better hypotheses (clipping/brevity/corpus)
n ...and do it faster
n optimization for 2-BLEU is sufficient to obtain good 4-BLEU
Main Conclusion
á lattices contain excellent translationsn +5-10 points to n-best list oraclesn +20 points above results of state-of-the-art models
á SMT has serious scoring problems Future Workn linear models not flexible enough? á non-linear score func.n more features needed? á rich featuresn training flawed? á discriminative training
with lattice oracles
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 14 / 15
Thank you for your attention!
Questions?
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 15 / 15
Bibliography
Dreyer, M., Hall, K. B., & Khudanpur, S. P. (2007).
Comparing reordering constraints for SMT using efficient BLEU oracle computation.In Proc. of the Workshop on Syntax and Structure in Statistical Translation (pp. 103–110). Morristown, NJ, USA.
Leusch, G., Matusov, E., & Ney, H. (2008).
Complexity of finding the BLEU-optimal hypothesis in a confusion network.In Proc. of the 2008 Conf. on EMNLP (pp. 839–847). Honolulu, Hawaii.
Li, Z. & Khudanpur, S. (2009).
Efficient extraction of oracle-best translations from hypergraphs.In Proc. of Human Language Technologies: The 2009 Annual Conf. of the North American Chapter of the ACL,Companion Volume: Short Papers (pp. 9–12). Morristown, NJ, USA.
Rush, A. M., Sontag, D., Collins, M., & Jaakkola, T. (2010).
On dual decomposition and linear programming relaxations for natural language processing.In Proc. of the 2010 Conf. on EMNLP (pp. 1–11). Stroudsburg, PA, USA.
Tromble, R. W., Kumar, S., Och, F., & Macherey, W. (2008).
Lattice minimum bayes-risk decoding for statistical machine translation.In Proc. of the Conf. on EMNLP (pp. 620–629). Stroudsburg, PA, USA.
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 1 / 2
Implementation
n OpenFST toolbox
á works with any semiring (PB required semiring of sets)á proven and well optimized ShortestPath algorithms
n Gurobi ILP solver
Artem Sokolov, G. Wisniewski, F. Yvon | LIMSI-CNRS | Lattice BLEU Oracles for SMT 2 / 2