Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents

1. 15/07/29 1 Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents Yusuke Oda Graham Neubig Sakriani Sakti Tomoki Toda Satoshi Nakamura ACL, July 27, 2015

2. 15/07/29 2 Two Features of This Study Syntax-based Machine Translation State-of-the-art SMT method for distant language pairs This is (NP) This is DT VBZ (NP) VP NP S (NP) Parse MT Simultaneous Translation Prevent translation delay when translating continuous speech In the next 18 minutes I'm going to take you on a journey. Translate Translate Translate Split Split

3. 15/07/29 3 Delay (depends on the input length) Speech Translation - Standard Setting in the next 18 minutes I 'm going to take you on a journey Speech Recognition 18 Machine Translation Speech Synthesis Problem: long delay (if few explicit sentence boundaries)

4. 15/07/29 4 Shorter delay Simultaneous Translation with Segmentation Separate the input at good positions ASR SS 18 MT Segmentation I 'm going to take you in the next 18 minutes on a journey The system can generate output w/o waiting for end-of-speech Translation Quality Segmentation Frequency Trade-off

5. 15/07/29 5 Unseen VP Syntactic Problems in Segmentation Segmentation allows us to translate each part separately But often breaks the syntax In the next 18 minutes I 'm going to ... PP NPIN NP NN NP NNSCDJJDT Iminutes18nextthein Predicted Boundary PP S IN NP PRP NP NNSCDJJDT Iminutes18nextthein (VP) VP Bad effect on syntax-based machine translation

6. 15/07/29 6 Motivation of This Study Predict unseen syntax constituents In the next 18 minutes I PP NPIN NP NN NP NNSCDJJDT Iminutes18nextthein PP S IN NP PRP NP NNSCDJJDT Iminutes18nextthein (VP) VP Predict VP Translate from correct tree 18 18 (VP)

7. 15/07/29 7 Summaries of Proposed Methods Proposed 1: Predicting and using unseen constituents Proposed 2: Waiting for translation this is NPthis is a pen this is a pen NP this is a pen Waiting this is Proposed 2 Proposed 1 SyntaxPrediction ASR Segmentation Translation Parsing Output

8. 15/07/29 8 What is Required? To use predicted constituents in translation, we need: 1. Making training data 2. Deciding a prediction strategy 3. Using results for translation

9. 15/07/29 9 Leaf span Making Training Data for Syntax Prediction Decompose gold trees in the treebank S VPNP NN NP DT VBZ penaisThis DT 1. Select any leaf span in the tree 2. Find the path between leftmost/rightmost leaves 3. Delete the outside subtree NN 4. Replace inside subtrees with topmost phrase label 5. Finally we obtain: nil is a NN nil Leaf spanLeft syntax Right syntax

10. 15/07/29 10 VP ... 0.65 NP ... 0.28 nil ... 0.04 ... Syntax Prediction from Incorrect Trees Iminutes18nextthein PP NPIN NP NN NP NNSCDJJDT 1. Parse the input as-is Input translation unit Word:R1=I POS:R1=NN Word:R1-2=I,minutes POS:R1-2=NN,NNS ... ROOT=PP ROOT-L=IN ROOT-R=NP ... 2. Extract features VP nil

11. 15/07/29 11 Syntax-based MT with Additional Constituents Use tree-to-string (T2S) MT framework This is NP This is DT VBZ NP VP NP S NP Parse MT Obtains state-of-the-art results on syntactically distant language pairs (e.g. EnglishJapanese) Possible to use additional syntactic constituents explicitly

12. 15/07/29 12 you on a journey Translation Waiting (1) Reordering problem Right syntax sometimes goes left in the translation in the next 18 minutes I (VP) 18(VP) 'm going to take (NP) (NP) Reordering Considering the output language, we should output future inputs before current input

13. 15/07/29 13 Waiting for Translation Heuristics: waiting for the next input in the next 18 minutes I (VP) 18(VP) 'm going to take (NP) (NP) Expect to avoid syntactically strange segmentation Wait 'm going to take you on a journey

14. 15/07/29 14 Experimental Settings Dataset Prediction: Penn Treebank MT: TED [WIT3] Languages EnglishJapanese Tokenization Stanford Tokenizer, KyTea Parsing Ckylark (Berkeley PCFG-LA) MT Decoder Moses (PBMT), Travatar (T2S) Evaluation BLEU, RIBES Methods Summary Baselines PBMT PBMT (Moses) ... conventional setting T2S T2S-MT (Travatar) without constituent prediction Proposed T2S-MT (Travatar) with constituent prediction & waiting

15. 15/07/29 15 Results: Prediction Accuracies Half precision Not trivial problem Low recall Caused by redundant constituents in the gold syntax I 'm a NN nilOur predictor I 'm a JJ NN PP nilGold syntax E.g. "I 'm a" Precision = 1/1 NN NN Recall = 1/3 JJNN NN PP Precision = 52.77% Recall = 34.87% Actual performance

16. 15/07/29 16 Results: Translation Trade-off (1) 0 2 4 6 8 10 12 14 16 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 TranslationAccuracy BLEU RIBES Mean #words in inputs Delay Short Long Short Long 0 2 4 6 8 10 12 14 16 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 PBMT Short inputs reduce translation accuracies Using N-words segmentation (not-optimized)

17. 15/07/29 17 Results: Translation Trade-off (2) 0 2 4 6 8 10 12 14 16 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 TranslationAccuracy BLEU RIBES Mean #words in inputs Delay T2S PBMT 0 2 4 6 8 10 12 14 16 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 Short Long Short Long Long phrase ... T2S > PBMT Short phrase ... T2S < PBMT

18. 15/07/29 18 Results: Translation Trade-off (3) 0 2 4 6 8 10 12 14 16 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 TranslationAccuracy BLEU RIBES Mean #words in inputs Delay T2S PBMT Proposed 0 2 4 6 8 10 12 14 16 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 Short Long Short Long Prevent accuracy decreasing in short phrases More robustness for reordering

19. 15/07/29 19 Results: Using Other Segmentation 0 2 4 6 8 10 12 14 16 18 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 TranslationAccuracy Mean #words in inputs Delay 0 2 4 6 8 10 12 14 16 18 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 Short Long Short Long BLEU RIBES Using an optimized segmentation [Oda+2014] T2S PBMT Proposed Segmentation overfitting But reordering is better than others

20. 15/07/29 20 Summaries Combining two frameworks Syntax-based machine translation Simultaneous translation Methods Unseen syntax prediction Waiting for translation Experimental results Prevent accuracy decrease in short phrases More robustness for reordering Future works Improving prediction accuracies Using other context features 0 2 4 6 8 10 12 14 16 0.42 0.44 0.46 0.48 0.5 0.52 0.54 0.56 0.58 0.6 this is NPthis is a pen NP this is a pen Waiting this is SyntaxPrediction Translation Parsing Output

Engineering

Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents