58
1 On Adaptive C OMPUTER - A SSISTED T RANSLATION 玉引磚拋 1

Future work on adaptive computer-assisted translation (拋磚引玉; throwing brick to reveal jade) for TAUS Tokyo 2014

Embed Size (px)

Citation preview

1

On Adaptive COMPUTER-ASSISTED TRANSLATION

今後の課題玉引磚拋

1

⼋八楽8 million spirits

joy

2

OutlineFull of trivial (embarrassing?) points

4

– Ed Hovy

“A plague of statistics has descended on our houses.”

5

e.g. 11,001 New Features for Statistical Machine Translation……

— George E. P. Box

“Essentially, all models are wrong, but some are useful.”

6

What went wrong?

7

First Brick in the Wall

• Via Negativa

• False positive/negative

• Error propagation

• Unknown unknown

9

Funny Autocomplete“autocomplete is not a function” is current top-1 Google

autocomplete of “autocomplete is”.10

Autocomplete is NOT a function

• Neither is auto-suggestion

• They are many-to-many relations with scores.

• Recognize this?

11

Many-to-many Scoring• Map by prefix, rank by popularity

• Google search box autocomplete

• Map by occurrence, rank by similarity

• Search (information retrieval)

• Map by information, rank by knowledge

• Translation12

Information?• Surface patterns and……

• Imaginations

• Quantum information theory

• Tensor (Network Algorithm)

• Quantum Physics and Linguistics

• Frobenius (diagrammatic) algebras (for semantics)

13

Knowledge?Black swan……

OK, too philosophical now.

14

Popularity & Similarity

• Popularity: famous or infamous?

• Consensus: social choice?

• Similarity

• Distance: rational choice?

15

Prefix, Occurrence• Surface pattern

• Regular

16

• Context-free

• Context-sensitive

• Recursively blahblah……

Map & Rank

• Regular expression

• Edit distance

17

Regular expression• [a-z]+

• Colours of cats and dogs.

• [^o]{2}

• Colours of cats and dogs.

• cat|dog

• Colours of cats and dogs.

• Colou?rs?

• Colours of cats and dogs.

• Colors of cats and dogs.

• Color of a cat.

• <[A-Za-z][A-Za-z]*>

• <html>Colours of cats and dogs.</html>

18

Edit Distance• Colors

• Delete s

• Color

• Insert u

• Colour

• Replace C with c

• colour

• Distance from Colors to colour: 3 (or 4 if the cost of replacing is 2)

19

– One may ask

“What if I wanted to map 1,1, one, and ONE?”

20

Normalization• time flies like an arrow. fruit flies like bananas.

• Case restoration

• Time flies like an arrow. Fruit flies like bananas.

• Sentence segmentation

• time flies like an arrow.

• fruit flies like bananas.

• Word normalization: stemming or lemmatization?

21

Stemming

• Porter Stemmer (mainly suffix stripping)

• flies → fli

• bananas → banana

• How about “flies → fly”?

• Lemmatization

22

Lemmatization• flies → fly

• better → good

• meeting

• meet?

• axes

• axe?

• axis?

23

Stemming or lemmatization, which is better?

“Battlestar Galactica is frakking wierd.”

24

Are we doing good?Evaluate it!

25

Confidence Score• Confidence interval? Confidence level?

• Not really

• But it can be

• Just a buzz word from speech recognition

• Shannon’s game

• Hidden-Markov models

• Generative

• The Italian who went to Malta

• Can be any reasonable score

• Mostly probability

26

Calculate Sentence Similarity

Confident

Trusted

Doubted

[partial match]

[exact match]

[no match]

a / b < threshold, since b is higherwhen

a = prob. of (#2(w1 w2 w3 w4) #1(w1 w2 w3) #1(w2 w3 w4)#1(w1 w2) #1(w2 w3) #1(w3 w4) #2(w1 w3) #2(w2 w4) #3(w1 w4));

b = avg. prob. of all known exact matches;where #n: any other (n - 1) words in-between.

Sentence: “w1 w2 w3 w4.”

27

Evaluate Pair: {Source, Target} Confidence

Confident

Trusted

Doubted

[Trusted Source]

[Confident Source]

[Doubted Source]

Triple: {Source, Target, Back}

Source Target

[Trusted Target]

[Not Doubted Target]

Evaluate Back Confidence

[Doubted Back]

28

What went wrong?

29

Summarization• Extraction

• Classification • Discriminative

• Abstraction • Aggregation

• Generative30

The name of the roseSounds depressing? Let’s try it anyway……

31

How about voting?Consensus and prediction: non-linear programming

32

Sentiment Analysis• Classification

• Polarity

•やばい

• Subjectivity

• In my opinion……

• Emotion33

Semantics?

• Classification vs.

• Ranking (as we’ve seen so far)

• Clustering

• Regression

• ……

34

Even Intractable • Minimum Feedback Arc Set

• NP-complete, APX-hard

• Bipartite Tournament

• Hypergraph Grammar

• Synchronous Grammar

• Arrow’s Impossibility Theorem

• Social Choice

• Voting System

35

– disputed

“Prediction is very difficult, especially about the future.”

36

There are two kinds of…

PAIN. The sort of pain that makes you strong, or useless pain. The sort of pain that's only suffering. I have no patience for useless things.

37

What might make me stronger……

(See also http://www.no-free-lunch.org)

38

Website Translation250~ S&B sites / 3 months:

~50% are compatible, 2 have paid39

Different StoryNY-based, IT capable

(See also https://dakwak.com)40

HTML Side-effect<span class=“notranslate”>Hello, WorldJumper!</span>

<!-- Are you talking to me? -->

41

I want more infoLess is more.

42

[[[坂⻄西優]]]です。•

[[[坂⻄西優=Suguru Sakanishi]]]です。• • • • • • •

43

More Anomalies • 【⽶米】

• 飛来物

• 菜の花

• 桃⽩白⽩白

• ⽩白⽴立斌

• Oh, I also want [[[this part to be a partially matched TM]]] pre-edited for MT, please?

44

Read my lipsIt’s not only about sound

45

Transliteration is not……

• Romanization • Transcription

46

Transliteration

• Alignment

• Alignment

• Alignment

• (And better be more than bilingual)

47

system using M2M-aligner, CRF models, and AV features in this work is explained in Section III. Section IV describes experiment results, and discussion is provided in Section V. Finally, Section VI draws a conclusion.

II. RELATED WORKS

A. CRF-based Transliteration A phrase-based transliteration system was presented that

groups characters into substrings to map to target names [16], to demonstrate how substring representation can be incorporated into a CRF model with local context and phonemic information. Shishtla et al. [17] adopted a statistical transliteration technique that consists of alignment model of GIZA++ [19] and CRF models. Instead of GIZA++, M2M-aligner is used and applied source grapheme AV for CRF-based transliteration [13].

A two-stage CRF method for transliteration was first designed to pipeline two independent processes [7][10]. The first stage predicts syllable boundaries of source names, and the second stage uses those boundaries to get corresponding characters of target names. The advantage of the two-stage CRF method is considerably decreasing the training cost with complex features than one-stage method in character-based labeling method. The downside, comparing with the one-stage method, is features of target language are not directly applied in the first stage. To recover from error propagations of the pipeline, a joint optimization of two-stage CRF method is then proposed to utilize n-best candidates of source name segmentations [11]. Another approach to reduce the local errors in boundary segmentation is the pools of CRF models for second stage model training [8][14].

B. Accessor Variety Accessor variety (AV) is a criterion to decide meaningful

Chinese words from consecutive Chinese characters in a sentence [12]. Another similar criterion for measurement of English and Chinese words called boundary entropy or branching entropy (BE) was used in several works. The basic idea behind these measurements is closely related to one particular perspective of n-gram and information theory of cross entropy or perplexity. Zhao and Kit [20] induced that AV and BE both assume that the border of a potential word is located where the uncertainty of successive characters increases, where AV and BE are regarded as the discrete and continuous versions, respectively, of the fundamental work of Harris [18], and then chose to adopt AV as the additional feature of CRF-based Chinese Word Segmentation (CWS). The AV of a string s is defined as:

. (1)

In Eq. (1), Lav(s) and Rav(s) are defined as the number of distinct preceding and succeeding characters, except when the adjacent character is absent due to a sentence boundary, and then the pseudo-character of the beginning or end of a sentence is accumulated indistinctly. More heuristic rules were also developed to remove strings that contain known words or adhesive characters [12]. For the strict meaning of

unsupervised features and for simplicity, this work does not include those additional rules.

The necessity of AV is primarily on the demand for semi-supervised learning. Since AV can be extracted from large corpora without any manual segmentation or annotation, hidden variables underlying frequent surface patterns of languages may be captured via an inexpensive and unsupervised method such as suffix array. Unsupervised feature selection of AV or similar features has generally improved effectiveness of supervised CWS on cross-domain and unlabeled data, and this work consequently considers that AV of un-segmented English names from training, development, and test data might help enhancing E2C transliteration.

III. METHODOLOGY

A. Basic CRF Theorem Conditional random fields (CRF) are undirected graphical

models trained to maximize a conditional probability and the concept is well established for sequence labeling problem [9]. Given an input sequence

Txx…1X = and label sequence

TyyY …1= , a conditional probability of linear-chain CRF with parameters },...,{ 1 nλλ=Λ can be defined as:

!"

#$%

&= ∑∑

=−

T

t kttkk tXyyf

11

X

),,,(expZ1X)|(YP λλ

(2)

where XZ is the normalization constant that makes probability

of all label sequences sum to one, ),,,( 1 tXyyf ttk − is a feature

function which is often binary valued, but can be real valued, and kλ is a learned weight associated with feature

kf .

Given such a model as defined in Eq. 2, the most probable labeling sequence for an input X is as follow.

)|(maxarg* XYPyY

Λ=

(3)

Eq. 3 can be efficiently calculated by dynamic programming using Viterbi algorithm.

B. EM for Initial Alignments In [15], the authors argued that previous work has generally

assumed one-to-one alignment for simplicity, but letter strings and phoneme strings are not typically in the same length, so null phonemes or null letters must be introduced to make one-to-one-alignments possible. Furthermore, two letters frequently combine to produce a single phoneme (double letters), and a single letter can sometimes produce two phonemes (double phonemes). For example, the English word “ABERT” with its Chinese transliteration “���”, which can be referred as “phonemes”, is aligned as [15]:

)}(),(min{)( sRsLsAV avav=

A BE RT | | | � � �

<!-- ⽩白⽴立斌 -->Hey! How about my privacy?

48

Overwriting

• <!-- John Doe #1 -->

49

Overwriting Side-effect⼋八楽の◯◯と申します。

50

Slot MachineEmail Template?

Rule-based Machine Translation?51

Multi-armed BanditReinforcement Learning

52

Reinforcement• Explore vs. Exploit

• Interactive

• Online

• Free Lunches

• Second moments and higher of algorithms' generalisation error

• Coevolution

• Confidence intervals can give a priori distinctions between algorithms

• People respond to incentives

53

Translate X for Y• {restaurant AD, coupon}

• {game, credit}

• {subtitle, DRM-free video}

• {Heart Sūtra, inner peace}

• {inside news, outside support}

• Taiwanese protesters

• {anything, incentives}

• See also: Unbabel, Duolingo

54

New Types of Assistance for Translators

by Philipp Koehn (http://www.mastar.jp/wfdtr/shiryou2013/Philipp%20Koehn.pdf

via http://www.mastar.jp/wfdtr/index-e.html)

55

ParaphrasingMonolingual translation

56

Wrap up• Where’s my pony semantics?

• Adaptation

• Chinese restaurant process

• Indian buffet process

• 信 (adequate)、達 (fluent)

• 雅 (elegant)?貼 (pertinent)?

• Bilingual might be insufficient: 全⽇日空 → ANA

• Pony: you can’t always get what you want

• Extrinsic evaluation

• Embrace and enjoy changes

57

<(_ _)>(translate me)

58