Improving Alignments for Better Confusion Networksfor Combining Machine Translation Systems
Necip Fazil Ayan and Jing Zheng and Wen WangSRI International
Speech Technology and Research Laboratory (STAR)333 Ravenswood AvenueMenlo Park, CA 94025
The state-of-the-art system combinationmethod for machine translation (MT) isthe word-based combination using confu-sion networks. One of the crucial steps inconfusion network decoding is the align-ment of different hypotheses to each otherwhen building a network. In this paper, wepresent new methods to improve alignmentof hypotheses using word synonyms and atwo-pass alignment strategy. We demon-strate that combination with the new align-ment technique yields up to 2.9 BLEUpoint improvement over the best input sys-tem and up to 1.3 BLEU point improve-ment over a state-of-the-art combinationmethod on two different language pairs.
Combining outputs of multiple systems perform-ing the same task has been widely explored invarious fields such as speech recognition, wordsense disambiguation, and word alignments, and ithad been shown that the combination approachesyielded significantly better outputs than the in-dividual systems. System combination has alsobeen explored in the MT field, especially withthe emergence of various structurally different MTsystems. Various techniques include hypothesisselection from different systems using sentence-level scores, re-decoding source sentences usingphrases that are used by individual systems (Rostiet al., 2007a; Huang and Papineni, 2007) andword-based combination techniques using confu-sion networks (Matusov et al., 2006; Sim et al.,
c 2008. Licensed under the Creative CommonsAttribution-Noncommercial-Share Alike 3.0 Unported li-cense (http://creativecommons.org/licenses/by-nc-sa/3.0/).Some rights reserved.
2007; Rosti et al., 2007b). Among these, confu-sion network decoding of the system outputs hasbeen shown to be more effective than the others interms of the overall translation quality.
One of the crucial steps in confusion networkdecoding is the alignment of hypotheses to eachother because the same meaning can be expressedwith synonymous words and/or with a differentword ordering in different hypotheses. Unfortu-nately, all the alignment algorithms used in confu-sion network decoding are insensitive to synonymsof words when aligning two hypotheses to eachother. This paper extends the previous alignmentapproaches to handle word synonyms more effec-tively to improve alignment of different hypothe-ses. We also present a two-pass alignment strategyfor a better alignment of hypotheses with similarwords but with a different word ordering.
We evaluate our system combination approachusing variants of an in-house hierarchical MT sys-tem as input systems on two different languagepairs: Arabic-English and Chinese-English. Evenwith very similar MT systems as inputs, we showthat the improved alignments yield up to an abso-lute 2.9 BLEU point improvement over the bestinput system and up to an absolute 1.3 BLEUpoint improvement over the old alignments in aconfusion-network-based combination.
The rest of this paper is organized as follows.Section 2 presents an overview of previous sys-tem combination techniques for MT. Section 3 dis-cusses the confusion-network-based system com-bination. In Section 4, we present the new hy-pothesis alignment techniques. Finally, Section 5presents our experiments and results on two lan-guage pairs.
2 Related Work
System combination for machine translation canbe done at three levels: Sentence-level, phrase-level or word-level.
Sentence-level combination is done by choosingone hypothesis among multiple MT system outputs(and possibly among n-best lists). The selectioncriterion can be a combination of translation modeland language model scores with multiple compar-ison tests (Akiba et al., 2002), or statistical confi-dence models (Nomoto, 2004).
Phrase-level combination systems assume thatthe input systems provide some internal informa-tion about the system, such as phrases used by thesystem, and the task is to re-decode the source sen-tence using this additional information. The firstexample of this approach was the multi-engine MTsystem (Frederking and Nirenburg, 1994), whichbuilds a chart using the translation units insideeach input system and then uses a chart walk algo-rithm to find the best cover of the source sentence.Rosti et al. (2007a) collect source-to-target corre-spondences from the input systems, create a newtranslation option table using only these phrases,and re-decode the source sentence to generate bet-ter translations. In a similar work, it has beendemonstrated that pruning the original phrase ta-ble according to reliable MT hypotheses and en-forcing the decoder to obey the word orderings inthe original system outputs improves the perfor-mance of the phrase-based combination systems(Huang and Papineni, 2007). In the absence ofsource-to-target phrase alignments, the sentencescan be split into simple chunks using a recursivedecomposition as input to MT systems (Mellebeeket al., 2006). With this approach, the final outputis a combination of the best chunk translations thatare selected by majority voting, system confidencescores and language model scores.
The word-level combination chooses the besttranslation units from different translations andcombine them. The most popular method forword-based combination follows the idea behindthe ROVER approach for combining speech recog-nition outputs (Fiscus, 1997). After reorderinghypotheses and aligning to each other, the com-bination system builds a confusion network andchooses the path with the highest score. The fol-lowing section describes confusion-network-basedsystem combination in detail.
Figure 1: Alignment of three hypotheses to eachother using different hypotheses as skeletons.
3 System Combination with ConfusionNetworks
The general architecture of a confusion-network-based system combination is as follows:
1. Extract n-best lists from MT systems.2. Pick a skeleton translation for each segment.3. Reorder all the other hypotheses by aligning
them to the skeleton translation.4. Build a confusion network from the re-
ordered translations for each segment.5. Decode the confusion network using vari-
ous arc features and sentence-level scoressuch as LM score and word penalty.
6. Optimize feature weights on a held-out testset and re-decode.
In this framework, the success of confusion net-work decoding for system combination depends ontwo important choices: Selection of the skeletonhypothesis and alignment of other hypotheses tothe skeleton.
For selecting the best skeleton, two commonmethods are choosing the hypothesis with the Min-imum Bayes Risk with translation error rate (TER)(Snover et al., 2006) (i.e., the hypothesis with theminimum TER score when it is used as the ref-erence against the other hypotheses) (Sim et al.,2007) or choosing the best hypotheses from eachsystem and using each of those as a skeleton inmultiple confusion networks (Rosti et al., 2007b).In this paper, we use the latter since it performsslightly better than the first method in our exper-iments. An example confusion network on threetranslations is presented in Figure 1.1
1In this paper, we use multiple confusion networks that areattached to the same start and end node. Throughout the restof the paper, the term confusion network refers to one networkamong multiple networks used for system combination.
The major difficulty when using confusion net-works for system combination for MT is aligningdifferent hypotheses to the skeleton since the wordorder might be different in different hypothesesand it is hard to align words that are shifted fromone hypothesis to another. Four popular methodsto align hypotheses to each other are as follows:
1. Multiple string-matching algorithm basedon Levenshtein edit distance (Bangalore etal., 2001)
2. A heuristic-based matching algorithm (Ja-yaraman and Lavie, 2005)
3. Using GIZA++ (Och and Ney, 2000) withpossibly additional training data (Matusovet al., 2006)
4. Using TER (Snover et al., 2006) betweenthe skeleton and a given hypothesis (Sim etal., 2007; Rosti et al., 2007b)
None of these methods takes word synonymsinto account during alignment of hypotheses.2 Inthis work, we extend the TER-based alignmentto use word stems and synonyms using the pub-licly available WordNet resource (Fellbaum, 1998)when aligning hypotheses to each other and showthat this additional information improves the align-ment and the overall translation significantly.
4 Confusion Networks with WordSynonyms and Two-pass Alignment
When building a confusion network, the goal is toput the same words on the same arcs as much aspossible. Matching similar words between two hy-potheses is necessary to achieve this goal.
When we align two different hypotheses usingTER, it is necessary that two words have the iden-tical spelling to be considered a match. However,in natural languages, it is possible to represent thesame meaning using synonyms of words in pos-sibly different positions. For example, in the fol-lowing sentences, at the same time and in themeantime, waiting for and expect, and setand established correspond to each other, re-spectively:Skeleton: at the same time expect israel
to abide by the deadlines set by .
Hypothesis: in the meantime , we are
waiting for israel to abide by the
established deadlines .
2Note that the approach by Matusov et al. (2006) at-tempts to align synonyms and different morphological formsof words to each other but this is done implicitly, relying onthe parallel text to learn word alignments.
Using TER, synonymous words mi