8
Supporting Multilingual Discussion for Collaborative Translation Noriyuki Ishida, Donghui Lin, Toshiyuki Takasaki, and Toru Ishida Department of Social Informatics, Kyoto University Yoshida-Honmachi, Sakyo-Ku, Kyoto, 606-8501, Japan [email protected], [email protected], [email protected], [email protected] Abstract— In recent years, collaborative translation has become more and more important for translation volunteers to share knowledge among different languages, among which Wikipedia translation activity is a typical example. During the collaborative translation processes, users with different mother tongues always conduct frequent discussions about certain words or expressions to understand the content of original article and to decide the correct translation. To support such kind of multilingual discussions, we propose an approach to embedding a service- oriented multilingual infrastructure with discussion functions in collaborative translation systems, where discussions can be automatically translated into different languages with machine translators, dictionaries, and so on. Moreover, we propose a Meta Translation Algorithm to adapt the features of discussions for collaborative translation, where discussion articles always consist of expressions in different languages. Further, we implement the proposed approach on LiquidThreads, a BBS on Wikipedia, and apply it for multilingual discussion for Wikipedia translation to verify the effectiveness of this research. Keywords- collaborative translation; language grid; machine translation; multilingual discussion I. INTRODUCTION With the rapid expansion of modern Internet, knowledge becomes easier and easier for people to share. For example, Wikipedia is the largest encyclopedia on the Web, which is available in many languages around the world and anyone can freely edit the articles. However, there are huge differences in the number of articles from language to language. Currently, English Wikipedia has approximately 3.2 million articles and the German one has 990 thousand, yet Wikipedia in some minor languages have only a few articles. In order to balance the knowledge among different languages, volunteer groups are conducting collaborative translation on the Internet, among which Wikipedia translation activity is the most typical one. To translate the articles correctly, discussions frequently occur among translation volunteers. Usually, such activities are conducted by bilinguals. However, bilinguals are not always available especially when it comes to minor languages. Therefore, it is important to create an environment for monolinguals to participate in the activities of translation and discussion, where people with different languages can use their mother tongues. There are some tools for multilingual communication, such as multilingual chat and multilingual BBS. For example, Flournoy et al. assess AmiChat [1], a multilingual chat system developed by AmiKai Inc., in which users’ inputs are automatically translated into other languages so that users can participate regardless of their language. To support multilingual discussion in collaborative translation activities, we propose an approach to combining a service-oriented multilingual infrastructure, the Language Grid [2], with the discussion functions in collaborative translation systems. The Language Grid is one of the most important multilingual infrastructures for sharing language services (machine translation services, dictionary services, parallel text services, etc.) and combining these services for different requirements, which has been applied in various intercultural collaboration activities [3]. However, we should also consider the special features of multilingual discussion in collaborative translation activities when developing the above supporting environments. First, there are often terms or expressions specific in the source language in an article for translation, and therefore translators conduct discussions to identify the meanings of such terms and to decide the appropriate translation (hereinafter we call these discussions about translation). Second, two problems occur if we combine services like machine translators to translate discussions: one is that passages which should not be translated are translated, and the other is that once translated passages are translated back again because of replies. Some systems for multilingual discussions have been developed, but there is no system for discussions about translations that has been developed considering the above features. To solve these problems, we propose a Meta Translation Algorithm, which selects the passages that should not be translated, and replaces their translation with the original passage concatenated with the translation. Further, we implement the proposed approach on a Wikipedia BBS to verify our research, and apply it for multilingual discussion for the Wikipedia translation activities. The rest of this paper is organized as follows. In Section II, we explain the design concept of multilingual discussion support system for collaborative translation by showing a design example for multilingual discussion in Wikipedia. Section III introduces the features of discussions about translation. In Section IV, we discuss the difficulties of 978-1-4673-1382-7/12/$31.00 ©2012 IEEE 234

Supporting Multilingual Discussion for Collaborative …lindh/papers/CTS2012_Ishida.pdfcommunication, such as multilingual chat and multilingual BBS. For example, Flournoy et al. assess

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supporting Multilingual Discussion for Collaborative …lindh/papers/CTS2012_Ishida.pdfcommunication, such as multilingual chat and multilingual BBS. For example, Flournoy et al. assess

Supporting Multilingual Discussion for Collaborative Translation

Noriyuki Ishida, Donghui Lin, Toshiyuki Takasaki, and Toru Ishida Department of Social Informatics, Kyoto University

Yoshida-Honmachi, Sakyo-Ku, Kyoto, 606-8501, Japan [email protected], [email protected], [email protected], [email protected]

Abstract— In recent years, collaborative translation has become more and more important for translation volunteers to share knowledge among different languages, among which Wikipedia translation activity is a typical example. During the collaborative translation processes, users with different mother tongues always conduct frequent discussions about certain words or expressions to understand the content of original article and to decide the correct translation. To support such kind of multilingual discussions, we propose an approach to embedding a service-oriented multilingual infrastructure with discussion functions in collaborative translation systems, where discussions can be automatically translated into different languages with machine translators, dictionaries, and so on. Moreover, we propose a Meta Translation Algorithm to adapt the features of discussions for collaborative translation, where discussion articles always consist of expressions in different languages. Further, we implement the proposed approach on LiquidThreads, a BBS on Wikipedia, and apply it for multilingual discussion for Wikipedia translation to verify the effectiveness of this research.

Keywords- collaborative translation; language grid; machine translation; multilingual discussion

I. INTRODUCTION With the rapid expansion of modern Internet, knowledge

becomes easier and easier for people to share. For example, Wikipedia is the largest encyclopedia on the Web, which is available in many languages around the world and anyone can freely edit the articles. However, there are huge differences in the number of articles from language to language. Currently, English Wikipedia has approximately 3.2 million articles and the German one has 990 thousand, yet Wikipedia in some minor languages have only a few articles. In order to balance the knowledge among different languages, volunteer groups are conducting collaborative translation on the Internet, among which Wikipedia translation activity is the most typical one. To translate the articles correctly, discussions frequently occur among translation volunteers. Usually, such activities are conducted by bilinguals. However, bilinguals are not always available especially when it comes to minor languages. Therefore, it is important to create an environment for monolinguals to participate in the activities of translation and discussion, where people with different languages can use their mother tongues. There are some tools for multilingual communication, such as multilingual chat and multilingual

BBS. For example, Flournoy et al. assess AmiChat [1], a multilingual chat system developed by AmiKai Inc., in which users’ inputs are automatically translated into other languages so that users can participate regardless of their language.

To support multilingual discussion in collaborative translation activities, we propose an approach to combining a service-oriented multilingual infrastructure, the Language Grid [2], with the discussion functions in collaborative translation systems. The Language Grid is one of the most important multilingual infrastructures for sharing language services (machine translation services, dictionary services, parallel text services, etc.) and combining these services for different requirements, which has been applied in various intercultural collaboration activities [3].

However, we should also consider the special features of multilingual discussion in collaborative translation activities when developing the above supporting environments. First, there are often terms or expressions specific in the source language in an article for translation, and therefore translators conduct discussions to identify the meanings of such terms and to decide the appropriate translation (hereinafter we call these discussions about translation). Second, two problems occur if we combine services like machine translators to translate discussions: one is that passages which should not be translated are translated, and the other is that once translated passages are translated back again because of replies. Some systems for multilingual discussions have been developed, but there is no system for discussions about translations that has been developed considering the above features.

To solve these problems, we propose a Meta Translation Algorithm, which selects the passages that should not be translated, and replaces their translation with the original passage concatenated with the translation. Further, we implement the proposed approach on a Wikipedia BBS to verify our research, and apply it for multilingual discussion for the Wikipedia translation activities.

The rest of this paper is organized as follows. In Section II, we explain the design concept of multilingual discussion support system for collaborative translation by showing a design example for multilingual discussion in Wikipedia. Section III introduces the features of discussions about translation. In Section IV, we discuss the difficulties of

978-1-4673-1382-7/12/$31.00 ©2012 IEEE 234

Page 2: Supporting Multilingual Discussion for Collaborative …lindh/papers/CTS2012_Ishida.pdfcommunication, such as multilingual chat and multilingual BBS. For example, Flournoy et al. assess

translation for “discussions about translation” and further describe our proposed Meta Translation Algorithm in details. In Section V, we introduce the implementation of multilingual discussion support system by using the proposed Meta Translation Algorithm, which is evaluated in Section VI, followed by the conclusion in the last section.

II. DESIGN OF MULTILINGUAL DISCUSSION SUPPORT SYSTEM FOR COLLABORATIVE TRANSLATION

In the collaborative translation activities, discussions about translation are generally multilingual, and hence the language barrier is a key problem. We can avoid this problem by defining one official language for the discussion, which results in another problem: those who cannot understand the language are unable to take part in the discussion. Obviously, participation and progress will be greatly enhanced if each user can use his/her own language. Therefore, it is necessary to have some support systems for multilingual discussion. In system design, not only the social needs of the collaborative translation community but also the technical requirements of the existing collaborative translation system should be considered.

In this section, we describe how we design the multilingual discussion support system for collaborative translation. To illustrate the design concept more clearly, we use an example of multilingual discussion for Wikipedia translation activity, which is built for Wikipedia communities in conjunction with a team from Wikimedia Foundation. We combine the Language Grid [2], a service-oriented multilingual infrastructure, with the discussion functions for Wikipedia translation.

A. Technical Requirements The goal of developing the multilingual discussion support

system for collaborative translation is that it would be well accepted by the actual collaborative translation community, which has a great number of international users. In the case of Wikipedia, it is implemented as the open source Wiki-software called MediaWiki written in PHP. MediaWiki Extensions are available to add new features or enhance the functionality of MediaWiki. For the technical point of view, as in any system development project, there are some technical requirements raised by the open-source community.

The first one is coding conventions. All codes should follow the conventions defined by the community. The conventions define indenting and alignment, logical structure, naming rules, and so on.

The second requirement is performance. Since collaborative translation system is viewed by a great number of users every day, good performance is one of the critical elements of system design.

The third is usability. For example, MediaWiki has its own look and feel, which should be consistent throughout the multilingual Wikipedia. Since Wikipedia is viewed by a variety of people of different age and computer skill, usability is one of the key elements to attract users to the site.

The fourth is security. As the Wikipedia site is one of the most famous Web sites around the world, it is the target of lots of cyber-attacks. In general, Wiki-style Web sites are more

likely to be attacked than non-wiki sites, because the former allow users to access and change the contents without login requirements. Intense discussions about security are found in the MediaWiki open-source community.

Last, neutrality and independency is important for the collaborative translation community. The community does not depend too much on specific vendors, services, or the influence of third parties, but employs open source software and services.

B. System Design and Prototyping From the software point of view, the architecture consists

of MediaWiki, Language Grid, Language Grid Extension and Multilingual LiquidThreads Extension. Most of the system parts have been prototyped. In developing our multilingual support system for Wikipedia discussion, in addition to meeting the needs of the community and technical requirements, we adopt the page configuration (a set of an article page and a discussion page). Depending on the domain of the article, user may want to use different language resources. Since the Language Grid is a multilingual infrastructure service, the services should allow access by MediaWiki extensions for general purposes.

Figure 1. Translation path setting between English and Japanese (A user can choose a translator service from five different translators in this case. “Advanced options” enables to choose global dictionaries to combine with a translator and morphological analyzer service. User can configure one Setting per page.)

C. Language Grid Extension Language Grid Extension, which is open-source software,

enables users to select and combine machine translators and multilingual dictionaries registered with Language Grid as an extension to MediaWiki. For example, it enables users to improve the translation quality by combining multilingual dictionaries, such as Wikipedia dictionaries or user dictionaries. MediaWiki Extension applies the API provided by the Language Grid Extension. After the Language Grid Extension

235

Page 3: Supporting Multilingual Discussion for Collaborative …lindh/papers/CTS2012_Ishida.pdfcommunication, such as multilingual chat and multilingual BBS. For example, Flournoy et al. assess

is installed, two new tabs appear in each article: “Setting” and “Page Dictionary.”

“Setting”, as shown in Fig.1, provides settings for the user interface of machine translators and multilingual dictionaries from the Language Grid Extension. It is possible to setup multilingual dictionaries categorized by machine translators and also to select machine translators from alternative candidates. As it is possible to combine more than two dictionaries with a machine translator, users can combine Page Dictionary and other dictionaries with any machine translator. Because the Setting feature allows users to customize language paths from any of the language services available, it meets the policy of Wikipedia community; heavy reliance on one specific proprietary service is to be avoided.

“Page Dictionary”, as shown in Fig. 2, enables users to add technical terms or community specific words, which appear in MediaWiki articles to the multilingual dictionaries.

The dictionaries created with the Language Grid Extension are categorized by article. Page Dictionary can be accessed by every user. When translation is performed through an API provided by the Language Grid Extension, it is possible to combine the Page Dictionary with machine translators. The machine translation quality can be improved with the help of words registered in the Page Dictionary.

Figure 2. Page Dictionary (Users can edit multilingual dictionaries collaboratively in all languages. User can configure one Setting per page.)

D. Multilingual LiquidThreads Extension Hautasaari et al. introduce Multilingual LiquidThreads

Extension as a multilingual version of LiquidThreads Extension, which was developed to enhance the usability of the discussion pages in MediaWiki. Messages from users in MediaWiki's discussion pages are automatically translated by the Language Grid Extension and displayed in the Multilingual LiquidThreads Extension [4]. In order to enhance the performance of the system, a posted message is translated when the first user of every language selection views the message (not when the message is posted.) The system sequence for displaying a message is shown in Fig. 3.

There are three features under development in Multilingual LiquidThreads Extension. The first feature is revision control and history. The revision history is accessible as described in the MediaWiki conventions. The second one is human correction of machine translation results. Human correction of machine translated articles and a history page for translated articles will be added to the Multilingual LiquidThreads Extension. The last one is the meta-translation (or translation of “discussions about translation”) feature. Although we make the automatic translation possible by combining the Language Grid and collaborative translation systems, it is also important to consider the features of discussion about translation, which will be described in details in the next section.

Figure 3. A system sequence diagram of invoking translation (‘msg’ is a ‘message.’ When a user selects a language that was never selected by any other users before, translation in the language is invoked and the translated message saved.)

III. DISCUSSIONS ABOUT TRANSLATION

A. Features Wikipedia is a huge multilingual knowledge base, and

enables many potential researches in different areas [5-7]. However, pages for the same topic in many different languages vary in size, scope, and quality [8], which brings the necessity of discussions about translation. Discussions have several features that are different from general conversations. The first one is that utterances in one language can contain words or phrases written in the other language. We show an example below:

(Example 1) Literal translation of “千と千尋の神隠し1” is “Sen and Chihiro’s Mysterious Disappearance.”

General conversations rarely contain words in multiple languages. However, in discussions about translation, words in

1 The Japanese title of the famous animated fantasy-adventure film Spirited Away (or Sen to Chihiro no Kamikakushi) directed by Hayao Miyazaki.

236

Page 4: Supporting Multilingual Discussion for Collaborative …lindh/papers/CTS2012_Ishida.pdfcommunication, such as multilingual chat and multilingual BBS. For example, Flournoy et al. assess

the other language are often discussed as Example 1. Moreover, the prior discussions are often used in the utterances, while citations are often used in order to advance the discussion.

In general, it is rare for all participants to share the same language discussions about translation, and hence it is necessary to hold the discussions in a multilingual environment. For example, when translating a Japanese article about Kyoto into English, an English speaker is necessary, but yet it is not sufficient. In order to understand terms or expressions about the cultural heritage of Kyoto, those who are familiar with Japanese culture must attend. They are usually Japanese speakers, and in general are not very good at English. Thus, discussions among them need some form of language support such as machine translators.

However, these features make using machine translators difficult. In many cases, the machine translation result is meaningless. Hereinafter, we concretely describe the problems occurring because of these features.

B. Translation of Sentences that contain Words in Other Languages Discussions about translation have a feature that utterances

in one language can contain the words or phrases written in the other language. When we use machine translators, we should specify the source language and target language. However, in the case of Example 1, it is not clear whether English is the source language or not because the English sentence contains Japanese words.

Moreover, even if we translate all the words in the sentence, we face the problem that the intention of the original discussion is not preserved. For example, the translation of the sentence, “Literal translation of ‘千と千尋の神隠し’ is ‘Sen to Chihiro no Kamikakushi,’” does not have the same meaning as the original. This is because the reference data inserted used in the discussion is replaced.

C. Translation of Sentences That Contain Citations As described above, discussions about translations suffer

from various problems. One is that passages which should not be translated are translated and hence the intention of the discussion cannot be understood. Another problem is that once translated sentences are re-translated and the quality declines because of citations.

In discussions about translation, citations are used very often, owing to the characteristic of this kind of discussion. This becomes a problem when we employ machine translation in multilingual environments. When messages are translated, citations in the messages are also translated. When the translated citations are transmitted as a message, they are retranslated. As retranslation may decrease the quality, the result can be meaningless.

IV. TRANSLATION OF “DISCUSSIONS ABOUT TRANSLATION”

A. Formalizing the Problems In Section III, we consider the problems that occur in

discussions about translation. Hereinafter, we model

“discussions about translation” and the translation, and then propose an algorithm to solve the problems.

To clarify the proposed model, we restrict “discussions about translation” to “sentences with word to word correspondence.” An example is shown below:

Can “센과 치히로의 행방불명” be translated into “Унесённые призраками?”

In particular, we focus on two matters: one is that translation is hard if the sentence contains words in another language, and the other is that intention of discussion can change when passages that should not be translated are translated.

First, we formalize discussions about translation. We represent information x in language L as xL. We also create a description in language L3, to ask whether a representation in L1 (pL1) corresponds to a representation in L2 (qL2) as des(pL1, qL2, L3). An example of des(pL1, qL2, L3) is shown as Example 2.

(Example 2) “센과 치히로의 행방불명” is translated into “Унесённые призраками.”

(L1: Korean, L2: Russian, L3: English)

Second, we consider the machine translation of des(pL1, qL2, L3), from language L3 into language L4. Also we represent a translation of description d from language L1 into language L2 as trans(d, L1, L2). The machine translation result of a discussion about translation, trans(des(pL1, qL2, L3), L3, L4)) should be either of the following:

1) des(pL1, qL2, L4),

2) des(trans(pL1, L2, L4), trans(qL2, L2, L4), L4).

If the result is 1), participants whose mother language is L4 cannot understand p and q, because they are not translated into L4. On the other hand, in case of 2), p and q, which are the original targets for translation, are translated into L4 to be the other data, and hence the intention of the discussion changes.

When we translate Example 2 into Japanese, the result of 1) is “‘센과 치히로의 행방불명’は‘Унесённые призраками’に翻訳される ” (in English: ‘센과 치히로의 행방불명’ is translated into ‘Унесённые призраками’), and English speakers cannot understand what the quoted words mean. On the other hand, the result of 2) is “‘千と千尋の神隠し’は‘千と千尋の神隠し’に翻訳される” (in English: ‘Sen to Chihiro no Kamikakushi’ is translated into ‘Sen to Chihiro no Kamikakushi’), which does not have the same intention as the original.

Thus, Meta Translation Algorithm should translate two representations in the discussion and preserve the intention of the discussion.

B. Preparation To develop the solution, we should first know what the

source language is. Here we use language tags enclosing each expression. In the case of Example 2, language tags are used as follows:

237

Page 5: Supporting Multilingual Discussion for Collaborative …lindh/papers/CTS2012_Ishida.pdfcommunication, such as multilingual chat and multilingual BBS. For example, Flournoy et al. assess

“<ko>센과 치히로의 행방불명</ko>” is translated into “<ru>Унесённые призраками</ru>.”

(“ko” represents Korean, and “ru” represents Russian)

Second, we should preserve the meaning of the discussion. However, as described above, when the discussion is translated, the meaning changes. Therefore, we consider an operation which preserves the original representation and adds the meaning as a description. Here, we define erep(xL1, L2) as an operation that returns xL1 + “(” + xL2 + “)” (this operation explains xL1 by xL2 which is enclosed in brackets. By using this operation, we can refer to the original when we re-translate it, which preserves the original.

C. Meta Translation Algorithm We start by specifying the requirements of Meta

Translation Algorithm using the expressions introduced in the previous section.

1. We take the language tags that enclose a passage are as indicators of the source language.

2. We attach the translation of a passage as a description, instead of its replacement, when they are enclosed in quotation marks or equivalent tags.

Moreover, when we consider the case Meta Translation Algorithm is repeatedly applied, the following requirement becomes necessary:

3. When we meta-translate passages that have been previously meta-translated, we first delete the description attached by the prior Meta Translation.

Now, we introduce a policy that ensures that the algorithm fulfills these requirements:

1. Input is description des(pL1, qL2, L3), source language L3, and target language L4.

2. Our algorithm “meta-trans” translates des(pL1, qL2, L3) from language L3 into language L4.

3. First, our algorithm replaces pL1 with x and qL2 with y. (Here we suppose x and y do not change by translation, so-called intermediate codes.)

4. Our algorithm translates des(x, y, L3).

des(x, y, L3) → des(x, y, L4) 5. If pL1 already contains a description, our algorithm

removes it. Our algorithm does the same to qL2.

6. Our algorithm applies pL1 and qL2 to erep.

pL1 → pL1 + “(” + pL4 + “)” qL2 → qL2 + “(” + qL4 + “)”

7. Our algorithm replaces x with pL1 and y with qL2.

The detailed Meta Translation Algorithm is shown in Fig. 4. We use the following example:

“千と千尋の神隠し” の文字通りの翻訳は “Sen and Chihiro’s Mysterious Disappearance” である.

(“Sen to Chihiro no Kamikakushi” is literally translated into “Sen and Chihiro's Mysterious Disappearance.”)

“千と千尋の神隠し(Spirited Away)” is literally translated into “Sen and Chihiro’s Mysterious Disappearance.”

This algorithm replaces all passages that should not be translated with intermediate codes, and re-replaces them after the machine translation. The string is the concatenation of the original string and the translation as the description.

Algorithm 1 meta-trans(a, L3, L4) a : description des(pL1, qL2, L3) L3 : discussion language L4 : target language xL: representation of information x in language L rep(a, x, y): function that replaces representation x in discussion a with representation y trans(xL1, L1, L2): function that translates representation xL from language L1 into language L2 erep(xL1, L2): Function that adds a description in language L2 to representation xL1. Defined as xL1+“(” + trans(xL1, L1, L2) + “)” med(pL): function that returns an intermediate code corresponding to representation pL + : operator of string concatenation a ← rep(a, pL1, med(pL1)) a ← rep(a, qL2, med(qL2)) a ← des(med(pL1), med(qL2), L4) a ← rep(a, med(pL1), erep(pL1, L4)) a ← rep(a, med(qL2), erep(qL2, L4)) return a

Figure 4. Meta Translation Algorithm

D. Application of Algorithm In the previous section, we mention the problem that

“discussions with citations” can be repeatedly translated. Here we show that we can solve this problem by extending our algorithm. We take the following discussion for example (language tags are omitted for brevity):

(Example 3) “‘센과 치히로의 행방불명’ is translated into ‘Унесённые призраками’” の情報源を提示してください.(Please show me the source for “‘센과 치히로의 행방불명’ is translated into ‘Унесённые призраками.’”)

When we translate this sentence from Japanese into English by Meta Translation Algorithm, the passage that should not be translated is the following sentence:

‘센과 치히로의 행방불명’ is translated into ‘Унесённые призраками.’

238

Page 6: Supporting Multilingual Discussion for Collaborative …lindh/papers/CTS2012_Ishida.pdfcommunication, such as multilingual chat and multilingual BBS. For example, Flournoy et al. assess

When our algorithm translates this sentence, the result of machine translation, from English into English, is used. (Actually, the translation is not executed because the source language is the same as the target language.) However, English speakers cannot understand it because it contains words whose language is not English. When we use Meta Translation Algorithm, instead of machine translation, we get the following description:

‘센과 치히로의 행방불명(Spirited Away)’ is translated into ‘Унесённые призраками(Spirited Away).’

We can handle Example 3 by applying Meta Translation Algorithm recursively. Moreover, because Meta Translation preserves the original sentence, we can prevent the sentence from being translated repeatedly.

V. IMPLEMENTATION OF DISCUSSION SUPPORT SYSTEM USING THE META TRANSLATION ALGORITHM

In Section II, we describe how we design the discussion support system with the example of Wikipedia. Here we implement the multilingual discussion support system using the Meta Translation Algorithm considering the features of discussion about translation.

A. Invocation of Meta Translation Service We implement “Meta Translation Algorithm” on the

Language Grid, as a language service so that we can use the algorithm through Language Grid Extension, which we have described in Section II. In the following part, we explain the Meta Translation Service in details.

As described above, we preserve passages that should not be translated by the following algorithm; it extracts the not-to-be translated phrases, translates the remainder, and then reinserts the phrases. Our algorithm uses a place holder which is not translated. Note that Meta Translation Service offers relaxed mode and strict mode. The former applies our algorithm to passages enclosed in quotation marks, while the latter applies it to passages enclosed in <tag> and </tag> and can handle nesting. In many cases, passages that should be meta-translated are enclosed in quotation marks beforehand to distinguish them from the other passages. Thus, relaxed mode does not require modifying the original sentence. On the other hand, the use of quotation marks makes it hard to handle nesting. Such cases should be handled in the strict mode. Another function specifies the language (enclose in <language_code> </language_code>).

B. Creation of Meta Translation Service Meta Translation Service should be embedded in tools like

multilingual BBS to offer discussions in multilingual environments. Because it is invoked through the Language Grid, it is necessary to create a Web service that implements the Meta Translation Algorithm. We implement this Web service as “Meta Translation Service.” The interface specification is shown in Table I. The Meta Translation Service can be invoked via SOAP specification. When invoking the service, it is necessary to specify a machine translator available to the service.

TABLE I. INTERFACE SPECIFICATION OF META TRANSLATION SERVICE

Description Execute Meta Translation with given parameters

Parameters sourceLang : source language code targetLang : target language code source : input sentence mode : relaxed / strict

Result The result of Meta Translation

C. Embedded in Multilingual LiquidThreads Meta Translation Service is invoked from Multilingual

LiquidThreads. The service invokes the specified translation service via the Language Grid, and executes the algorithm. Current Multilingual LiquidThreads invokes a translation service registered with the Language Grid. We can apply the Meta Translation Algorithm by invoking Meta Translation Service, instead of the currently used translation service.

Meta Translation Service’s processing has three phases: preprocessing, invocation of translation, and post-processing.

(a) Wikipedia Article

(b) Interlanguage Links in the Article in (a)

Figure 5. Interlanguage Links

D. Dictionary We can support discussions about translations by using

Meta Translation Service on Multilingual LiquidThreads.

239

Page 7: Supporting Multilingual Discussion for Collaborative …lindh/papers/CTS2012_Ishida.pdfcommunication, such as multilingual chat and multilingual BBS. For example, Flournoy et al. assess

However, we cannot avoid the problem of limited accuracy of machine translations, because we use machine translations to build the multilingual environments. In order to improve the accuracy, it is useful to use dictionaries that improve the translation quality for special domains of the discussion. This time, we create a dictionary from Wikipedia to use in discussions about Wikipedia translation.

Wikipedia’s article has inter-language links. They connect an article with the corresponding article in the other language (Fig. 5).

We can access a dictionary by using inter-language links. The number of entries totals 1.5 million if the dictionary is extracted from English Wikipedia. It is hard to use without editing because it has too many entries, but we can get useful dictionaries that can raise the accuracy of machine translations when we restrict the article field.

VI. EVALUATION

A. Outline of Evaluation We evaluate our algorithm from the aspects of effect of

algorithm, dependence on the performance of machine translators, and burden placed on users. In addition, we use dump data of English Wikipedia (Oct. 2010) for evaluation. We show the number of articles, discussions, and utterances in Table II. We can see that 88% of the articles were discussed. We approximate the number of discussions about translation by the number of multilingual discussions, which were automatically extracted.

TABLE II. NUMBER OF DATA ENTRIES

Articles 3,724,963Discussions 3,284,070Utterances 8,894,919Discussions about translation 64,986

B. Effect of Algorithm We extract randomly 100 utterances from discussions about

translation, and then estimated the effect of algorithm. Firstly, we classify the extracted data as Table III. According to the interview held in this experiment, more than 90% of following utterances are understandable after application of Meta Translation Algorithm: “Utterances that ask correspondence of the meaning”, “Utterances that cite the other utterances” and “Other multilingual utterances.” This effect is the same as the anticipated result.

On the other hand, “Utterances that define words”, “Utterances related to spelling” and “Utterances related to pronunciation” cannot be said it is understandable even after application of our algorithm. The reason of this result is thought to be follows: in case of “Utterances that define words,” it becomes hard to strictly define the word because the definition is translated. And also, in case of “Utterances related to spelling” and “Utterances related to pronunciation,” participants could understand the logic of utterances, but after

all knowledge about the source language was necessary to understand the content properly.

TABLE III. UTTERANCES IN DISCUSSIONS ABOUT TRANSLATION

Utterances that ask correspondence of the meaning 45Utterances that cite other utterances 10Utterances that define words 2Utterances related to spelling 11Utterances related to pronunciation 10Other multilingual utterances 22

We further estimate the effect in the experiment. As the result, we find that 70% of problems that occur when we translate “discussions about translation” with machine translation. However, some problems remain unsolved. In order to solve these problems, we need other techniques such as passage comprehension.

C. Dependence on the Performance of Machine Translators In Meta Translation Algorithm, machine translation is used

to translate whole the discussion and the word in the discussion. In this regard, however, this time we restrict discussions about translation enough. Moreover, we use dictionaries to translate words in discussions; hence it is considered that dependence on the performance of machine translators is low. As an evaluation, we examine degree of understanding in the results of Meta Translation Algorithm; each result uses either of four machine translators registered to the Language Grid. This time, we measure participants’ understandability in following 5-point scale: 1 (None), 2 (Little), 3 (Middle), 4 (Almost), 5 (All).

As the result, regardless of the translator, the average is more than 4. Therefore, we can argue that the dependence on the performance of machine translators is low, as far as the translation of discussions about translation.

D. Burden Placed on Users To apply our algorithm the user must specify “passages that

should not be translated” and “passages whose language is not the same as those of the discussion.” However, the burden is minimized by the characteristics of the algorithm. First, in many cases passages are enclosed in quotation marks. Actually, in the experiment, 85% of “passages that should not be translated” were originally enclosed in quotation marks. Thus, when we use “relaxed mode,” our algorithm imposes no burden. On the other hand, in order to detect the latter passages, users should specify the language. In order to decrease this burden, we can use language identification programs. In the experiment, we succeed to generate language tags automatically in 93% of “passages whose language differed from those of the discussion.” As proved in the experiment, the current burden placed on users is not huge. In the future, we can decrease it further by applying techniques of passage comprehension.

VII. CONCLUSION When we hold multilingual discussions about translation in

collaborative translation, several problems are encountered; one is that passages which should not be translated are

240

Page 8: Supporting Multilingual Discussion for Collaborative …lindh/papers/CTS2012_Ishida.pdfcommunication, such as multilingual chat and multilingual BBS. For example, Flournoy et al. assess

translated, and the other is that once translated passages are retranslated. In this paper, we first propose an approach to embedding a service-oriented multilingual infrastructure with discussion functions in collaborative translation systems, where discussions can be automatically translated into different languages. Then, we propose Meta Translation Algorithm to solve the problems of multilingual discussion for collaborative translation, which preserves the original passages properly, and adds the translation as descriptions to the passages. Moreover, it can solve the latter problem because it preserves the original passages. We further implement the algorithm in the multilingual discussion support system, and apply in for the Wikipedia translation as the LiquidThreads, a BBS on MediaWiki.

Our proposed Meta Translation Algorithm has been implemented as a Web service on the service-oriented multilingual platform, which can be combined with other various language services and be embedded in different types of multilingual applications.

ACKNOWLEDGMENT This work was partially supported by Service Science,

Solutions and Foundation Integrated Research Program from JST RISTEX.

REFERENCES [1] Flournoy, S. R., and Callison-Burch, C. Secondary Benefits of Feedback

and User Interaction in Machine Translation Tools, Workshop paper for “MT2010: Towards a Roadmap for MT” of the MT Summit VIII, pp. 2-3, 2001.

[2] Ishida. T., Language Grid: An Infrastructure for Intercultural Collaboration, IEEE/IPSJ Symposium on Applications and the Internet, pp. 96-100, 2006.

[3] Ishida, T. Ed., The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability. Springer, 2011.

[4] Hautasaari, A., Takasaki, T., Nakaguchi, T., Koyama, J., Murakami, Y., and Ishida, T. Multi-Language Discussion Platform for Wikipedia Translation, The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability, 2011, pp.231-246.

[5] Potthast, M., Stein, B., and Anderka, M. A Wikipedia-Based Multilingual Retrieval Model, Proceedings of the IR research, 30th European conference on Advances in information retrieval, pp. 522–530, 2008.

[6] Mihalcea, R. Using Wikipedia for Automatic Word Sense Disambiguation, Proceedings of NAACL HLT, pp. 196-203, 2007.

[7] Zesch, T., Müller, C., and Gurevych, I. Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary, Proceedings of the Conference on Language Resources and Evaluation (LREC), pp. 1646-1652, 2008.

[8] Adar. E., Skinner, M., and Weld, D. S. Information Arbitrage across Multilingual Wikipedia, 2nd ACM International Conference on Web Search and Data Mining, p. 96, 2009.

241