57
Master Thesis Analysis of the Effects of Japanese-Chinese Machine Translation with Kanji/Simplified Chinese Conversion Supervisor Professor Toru Ishida Department of Social Informatics Graduate School of Informatics Kyoto University Nan Jin February 7, 2013

Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

Master Thesis

Analysis of the Effects of

Japanese-Chinese Machine Translation

with Kanji/Simplified Chinese

Conversion

Supervisor Professor Toru Ishida

Department of Social Informatics Graduate School of Informatics

Kyoto University

Nan Jin

February 7, 2013

Page 2: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

i

Analysis of the Effects of Japanese-Chinese Machine Translation with Kanji/Simplified Chinese Conversion

Nan JIN Abstract

Currently, most Japanese-Chinese machine translations use English as an intermediary language. Because of lack of enough Japanese-Chinese bilingual dictionaries, there is less precise in Japanese-Chinese machine translation than Japanese-English or Chinese-English machine translations. In order to make translations smooth and adequate, it is necessary and efficient for native people to modify those translations.

On the other hand, machine translation assisted communication is widely used in modern time. Owing to English mediated and difficulty in translating in intercultural communication, the effect of machine translation assisted Japanese-Chinese communication is not very well. Because of Sinosphere that known as Chinese character culture, Japanese and Chinese people under Sinosphere can understand each other only by writing those Chinese characters which can keep semantic understanding. In some situations, writing Chinese characters is more effective.

Based on those observations, I propose combination of Japanese-Chinese machine translation and Kanji/Simplified Chinese conversion to support intercultural communication and human-assisted translations. Within this approach there are two issues that need to be solved.

1. Incomplete comprehensions and misunderstandings of Japanese-Chinese machine translations

The difficulty in machine translation assisted with communication and human-assisted translation for Chinese monolinguals lies in the simple fact that current machine translations cannot provide adequate and fluent results that are always wordy and unnatural, especially in communication. Meanwhile, bad translation results are barriers to comprehend and it even causes misunderstanding when modifying machine-translated sentences. As monolinguals, they cannot confirm the meaning of originals. Thus, without a

Page 3: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

ii

complete understanding of machine-translated sentences, we cannot provide a better support to modify those sentences.

2. Unclear categories in which converted Kanji vocabularies can be applied to modify

Though there is a deep relationship between Japanese and modern Chinese, not all of Kanji consisted vocabularies can make contributions to understanding Japanese-to-Chinese translations better since many of them have different or even reverse meanings. So we must confirm categories in which vocabularies can be useful to support modifications.

The main contributions of this research could be argued as follows: 1. Building a working process to classify categories of Kanji

vocabularies in human-assisted translation Compared to other working process or models of human-assisted translation

in related researches, we design and build a new working process to classify categories of Kanji consisted vocabularies that can help improve comprehension in human-assisted translation.

2. Clarifying the effects of introducing converted Simplified Chinese/Kanji to in communication and modifying translation

The effects of introducing converted Kanji/Simplified Chinese to modify Japanese-Chinese machine translation activities are studied and reported in this research by analysis on usage of converted Kanji/Simplified Chinese vocabularies in modification and communication for Chinese monolinguals. From our experiment using Kanji/Simplified Chinese character conversion tool, it was shown that converted Kanji cannot only make the machine translations easy to be understood, but also help to modify some incorrect translations of proper nouns. Based on statistic results, we classify frequently used converted Kanji vocabularies into several categories. However, it is not effective to be used in oral communication.

These findings suggest that there is an opportunity that Chinese without enough Japanese skills are capable of effectively contributing to improve Japanese-Chinese translations with the assistance of machine translation and converted Kanji vocabularies.

Page 4: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

iii

日中機械翻訳と漢字/簡体字変換の併用効果の分析

金 楠 内容梗概 現在、日中機械翻訳は主に英語を介して行われている。しかし、日英や中英

機械翻訳に比べて日中対訳辞書が不足しているため、日中機械翻訳の精度が低

い点が問題となっている。効率良く翻訳品質を高めるには、ネイティブの人々

が機械翻訳文を修正することが必要とされる。

一方、近年広く使われている機械翻訳には主に英語を介して行われるため、

それに、異文化コミュニケーションが困難であり、日中間に機械翻訳を介する

コミュニケーションには効果がよくない。ところが、漢字文化圏では、漢字を

介することで中国人や日本人が筆談の形式でお互いに理解することができる。

つまり、ある場合には、漢字を書くことがより有効であると言える。

これらの現状に基づいて,本研究では異文化間コミュニケーションと人間

支援翻訳をサポートために日中機械翻訳と漢字/簡体字変換の併用を提案した。

本研究では次の二つの課題に取り組んだ。

1. 日中機械翻訳結果における不完全な理解と誤解

現在の翻訳機では、高精度かつ流暢な翻訳結果を中国語のモノリンガルに提

供することは困難であるため、機械翻訳を介するコミュニケーションや人間支

援翻訳が難しい。また、翻訳結果を修正する際、不自然な翻訳結果により理解

の壁を超えられず、さらに誤解を引き起こしてしまう。モノリンガルは原文の

意味を確認できないため、より良い翻訳文の修正結果を提供するためには機械

翻訳結果を完全に理解する必要がある。

2. 変換した漢字語彙を適用できるカテゴリの不明確さ

日本語と現代中国語の間には深い関係があるが、中国人に対して、すべての

漢字語彙から簡体字への変換結果を理解できない。その原因は、ある日本語単

語の意味が中国語の意味と大きく異なる場合があることである。したがって、

翻訳結果の修正を有効にサポートするには、変換した語彙の適用可能なカテコ

ゴリを確認しなければならない。

本研究の貢献は以下の通りである。

1. 利用できる漢字単語を分類するためのプロセスの構築と適用

関連研究での人間支援翻訳におけるプロセスやモデルと比較し、日中翻訳

を支援するために、新たなプロセスを設計して構成した。構築したプロセスに

Page 5: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

iv

は漢字/簡体字変更を利用する機械翻訳結果の修正を行われた。また、プロセス

の有効性を示すための実験を行った。

2.コミュニケーションと翻訳結果修正に漢字/簡体字変換利用効果の

明確化

本研究では、中国人参加者がコミュニケーションと翻訳結果修正で漢字/簡

体字変換をどのように利用するか、その状況を分析した。その上で、 日中機械

翻訳と漢字/簡体字変換の併用効果を評価して結果をまとめた。漢字/簡体字変

換ツールを利用した実験により、変換された漢字語彙は機械翻訳結果を理解し

やすくなることに役立つだけではなく、固有名詞の不適当な翻訳結果の修正を

サポートできることを示した。さらに、単に変更された機械翻訳と比較して、

漢字変換のほうが妥当性が高いことを示した。それに、他の分野には漢字変換

も応用して分類できる。例えば、新聞記事や特定のWikipedia記事などへの応用が考えられるしかし、口頭のコミュニケーションでは記事の修正よりうまく

できなかった。

以上の結果により、本研究では日中機械翻訳と漢字/簡体字変換の併用を基

づいて、日本語能力が関わらず、中国人が日中翻訳結果の改善に貢献できる可

能性が高くなることを示唆した。

Page 6: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

Analysis of the Effects of Japanese-Chinese Machine Translation with Kanji/Simplified Chinese Conversion

Contents

Chapter 1 Introduction 1  

Chapter 2 Background 4  2.1 Chinese Character ···················································································· 4  

2.1.1 Simplified Chinese and Kanji ·························································· 6  2.1.2 Convert Kanji/Simplified Chinese in Machine Translation ··········· 8  

2.2 Sino-Japanese Vocabulary ······································································ 9  2.3 Knowledge Transfer ··············································································· 10  2.4 Translating-Transliterating Japanese-Chinese Terms ··························· 11  

Chapter 3 Related Literature 13  

Chapter 4 Communication Experiment 15  4.1 Experiment Setting ················································································· 15  4.2 Evaluation ····························································································· 20  

4.2.1 Comprehension of Chinese characters ········································· 20  4.2.2 Convenience of using Chinese characters ···································· 22  4.2.3 Efficiency of knowledge transfer ·················································· 23  

4.3 Discussion ····························································································· 26  

Chapter 5 Translation Experiment 29  5.1 Experiment Objective ············································································ 29  5.2 Experiment Design ················································································ 30  5.3 Results and Analysis ·············································································· 34  

5.3.1 Adequacy ······················································································ 34  5.3.2 Comprehension ············································································ 38  5.3.3 Classification ················································································ 42  

5.4 Discussion ····························································································· 44  

Chapter 6 Conclusion 46  

Acknowledgments 48  

Page 7: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

References 49  

Page 8: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

1

Chapter 1 Introduction As a global language, people from a variety of countries with different cultures

can communicate with each other in English. However, language barriers still remain a significant problem when people don’t have an excellent skill of English, especially in East Asia where there are different language system from English.

Machine translation (MT) may be such a powerful tool to support multilingual communication that it is effective to let monolinguals with different native languages comprehend each other. While MT is useful for realizing some level of communication for monolinguals in spite of some wrong or bad translations, it is still difficult for monolinguals to understand most of them because large numbers of translated sentences are neither adequate nor fluent [4]. Based on the fact, human-assisted machine translation can improve the fluency of translated sentences by native speakers with modifying the translations, but it is still so hard for them to pick up the meanings of bad translations because of lack of necessary hints to support comprehension.

On the other hand, lack of related dictionaries is also an important reason for low precision of machine translation. In most conditions, bilingual dictionaries used for machine translation should be created in advance and it is also necessary to add translations of some new terms regularly. Meanwhile, most of bilingual dictionaries are based on English currently so that most MT among other languages should be mediated English that cause not only low precision in translating some specific terms, proper nouns, etc., but also bad structures of translations that is influent. Take Japanese-Chinese machine translation for example, Chinese terms ‘饺子’(Jiaozi) and ‘小笼包’(Xiaolongbao), which are known as famous traditional Chinese foods in Japan, have corresponding Japanese translations that are only converted from characters: ‘餃子’(Gyoza) and ‘小籠包’(Shoronpo). However, since ‘Dumplings’ is the translation of both ‘Jiaozi’ and ‘Xiaolongbao’ in English, as a result, the Japanese translation of ‘Xiaolongbao’ becomes ‘Gyoza’ that is not precise in fact.

Chinese characters are named Hanzi in Chinese and Kanji in Japanese. A Chinese character is an ideogram, which were widely used in the East Asia

Page 9: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

2

countries, like China, Japan, Korea and Vietnam in history. Owing to wide use of Chinese character, Sinosphere, which is known as Chinese character cultural sphere, emerged in those countries at first. Though the pronounces of a same character are different from country to country, and people in Sinosphere own their different language knowledge and unique cultures, they still may understand the meaning of a Hanzi jargon only from the characters. And in past, people under Sinosphere can communicate with each other only by writing Chinese characters1.

Modern Japanese and Korean are consisted of not only Chinese characters, but also their own unique character: Japanese Hiragana and Katakana, Korean Hangul. Though there are significantly huge changes among modern Chinese, Japanese and Korean in usage of words or terms and character forms, the relationship among the three languages is still remaining. In spite of variants of Chinese characters, a large number of Chinese terms can be translated into Japanese only by character conversion.

Owing to Chinese characters that carry semantic weight, and in many situations, they share the same vocabularies in spite of different pronounces and variants, I propose an approach that using conversion between Kanji and Simplified Chinese (since all Chinese participants can write Simplified Chinese only) in modification of Japanese-Chinese machine translation. To explore the feasibility of this approach, we ran series of experiment where participants carried out tasks of modifying machine translation by Chinese speakers that can improve fluency of machine translations, and multilingual communication between Chinese and Japanese.

In this thesis, I present some findings from analyzing the effects of using Kanji/Simplified Chinese conversion in the modification experiments and multilingual communication with the assistance of machine translation. Those findings are important in understanding the role converted Kanji vocabularies played in improving comprehension machine translations for monolinguals

1 http://ja.wikipedia.org/wiki/筆談

Page 10: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

3

communication and to consider further support for their modifying translation activities. The rest of the thesis is organized as follows.

In Chapter 2, we introduce the background of Chinese characters, including Kanji and Simplified Chinese, Kanji/Simplified Chinese conversion and so on.

In Chapter 3, we introduce some related literature of this thesis, especially some existing related works about multilingual communication. Besides, some researches about human-assisted translation are also introduced which is import in this thesis.

In Chapter 4, we describe some communication experiments that are planed to find the possibility of using Kanji/Simplified Chinese conversion in Japanese-Chinese machine translation mediated communication.

In Chapter 5, we give an overview on the Wikipedia and news articles translation modification experiment and propose a new working process of using Kanji/Simplified Chinese conversion to support human-assisted translation. Details of Task, participants, apparatus and designed procedure used in this experiments are explained. We also give the analysis result of the experiment. We firstly evaluate adequacy of machine translation modification and converted Kanji mediated modification, prove the effectiveness of understanding bad machine translation and knowledge transferring in modifying bad translated sentences when referring to converted Kanji vocabularies, and also clarify how converted Kanji influence on comprehension and modification.

Finally, we make a conclusion of the thesis in Chapter 6.

Page 11: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

4

Chapter 2 Background In this chapter, we are going to introduce background of collaborative

translation and also give an overview on Chinese character and current usage in Japan and China. As a result of collective intelligence from past to now, Chinese characters not only carry semantic weight but also are easy for people under Sinosphere to recognize, remember and understand.

2.1 Chinese Character Chinese characters are named Hanzi(汉字 , Simplified Chinese/漢字 ,

Traditional Chinese)in Chinese and Kanji (漢字) in Japanese. Though Korea and Vietnam have used their own albeit languages, Chinese characters still are used in some formal situations, like street signals. In Korean, Chinese characters are named Hanja that most Koreans can recognize them while in Vietnamese they call Chinese character as hán tự.

In Chinese history, Hanzi was the main official character, and in the modern time, it is designed as standard of writing character in China. In ancient times, Chinese characters has developed into a highly comprehensive standard, it was not only used in China, but also widely used in East Asia as the only international characters in a long period of time. Table 1 shows that Chinese characters were important part in Chinese, Japanese and Korean since there were large numbers of Chinese characters in dictionary. Before the 20th century, Chinese character was the official standard writing character in Japan, the Korean Peninsula, Vietnam and other countries.

Table 1 Number of Characters in China, Japan and South Korea1

Country Name of Dictionary Number of Character

China Yitizi Zidian 106230

Japan Dai Kan-Wa jiten Over 50000

South Korea Han-Han Dae Sajeon 53667

1 http://en.wikipedia.org/wiki/Chinese_character

Page 12: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

5

There are a lot of characters in those countries. The total number of Chinese

characters from past to present remains unknowable because new ones are developed all the time.

A Chinese character is an ideogram composed of mostly straight lines or ‘poly-line’ strokes [1]. In Chinese character classification, there are six categories: ideograms, pictograms, transformed cognates, ideogrammic compounds, phono-semanitc compounds and rebus. Table 2 shows percentage of Chinese characters classified in these 6 categories1.

Table 2 Chinese Characters Classification

Category Percentage of characters (approximation)

Phono-semantic compounds 82%

Ideogrammic compounds 13%

Pictograms 4%

Ideograms Few (less than 1%)

Transformed cognates Few

Rebus Few

Ideograms stand for these characters either modify existing pictographs

iconically, or are direct iconic illustrations. Characters in pictograms derive from pictures make up small portion of Chinese characters. Transformed cognates make up characters that didn’t represent the same meaning but have bifurcated through orthographic and semantic drift but is often omitted in modern Chinese characters. Ideogrammic compounds combine pictograms or ideograms to create new kind of characters and occupy 13% of characters. As the table shows, Phono-semanitc compounds include numerous characters that are consisted of two parts: an existing character and a limited set of character

1 http://www.chinaknowledge.org

Page 13: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

6

named ‘radicals’ that is often graphically simplified from characters, which can be suggested the meaning. Rebus means borrow from characters, where represent unrelated words with similar or identical pronunciation.

Owing to those categories that consist of Chinese characters, Chinese characters carry large weight of semantic, so Sinosphere that is known as Chinese character cultural sphere emerged in those countries where people have Chinese character skills at first. Though the pronounces of a same character are different from country to country, and people in Sinosphere own their different language knowledge and unique cultures, they still may understand the meaning of a Hanzi jargon only from the character owing to Sinosphere. For example, ‘水’ is called shui in Chinese and mizu in Japanese, but both of them mean water in English. So they can make conversations with people from other countries only by writing Chinese character that was named conversation by writing. This communication approach was particularly used hundreds years ago, and only with conversation by writing, people could make a good understanding with others. 2.1.1 Simplified Chinese and Kanji

Modern Chinese characters in the Chinese system are roughly divided into two systems of Traditional Chinese and Simplified Chinese1. The former is mainly used in Hong Kong, Macao and Taiwan, while the latter is used in mainland China, Singapore, Malaysia and other countries to adopt.

Simplified Chinese characters are standardized Chinese characters for use in mainland China. Decreasing the number of strokes and simplifying the forms of a sizable proportion of traditional Chinese characters created simplified character forms2. Some simplifications were based on popular cursive forms embodying graphic or phonetic simplifications of the traditional forms. Some characters were simplified by applying regular rules, for example, by replacing all occurrences of a certain component with a simplified version of the component. Variant characters with the same pronunciation and identical

1 http://www.gov.cn/xwfb/2006-03/22/content_233556.htm 2 http://en.wikipedia.org/wiki/Simplified_Chinese_characters

Page 14: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

7

meaning were reduced to one single standardized character, usually the simplest amongst all variants in form. Finally, many characters were left untouched by simplification, and are thus identical between the traditional and simplified Chinese orthographies.

Modern Japanese are consisted of not only Chinese characters, but also their own unique character: Japanese Hiragana and Katakana. Though there are significantly huge changes among modern Chinese and Japanese in usage of words or terms and character forms, the relationship between the two languages is still remaining. For example, above 70% modern Chinese terms, mainly including chemistry, politics and other jargons, originated from modern Japanese terms. Besides there are many regions and countries, like Japan, Vietnam, where used or integrated Chinese characters in their own languages. Despite of variants of Chinese characters, a large number of Chinese terms can be translated into Japanese only by character conversion.

In character, there are some similar characters in Simplified Chinese but still some difference.

Table 3 Difference of Simplified Chinese, Kanji, Traditional Chinese

Simplified Chinese

Kanji Traditional Chinese

Meaning Comparison

电 電 電 electricity

Simplified in mainland China, not Japan

(Some radicals were simplified)

冰 氷 冰 ice Simplified in Japan, not Mainland China

⻰龙 竜 龍 dragon Simplified in Mainland China and Japan, but in different ways

国 国 國 country Simplified in Mainland China and Japan, but identical

The difference between Simplified Chinese and Japanese Kanji is shown in Table 3 when Traditional Chinese character is a standard of original character.

Page 15: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

8

The first two rows describes the conditions that one of Simplified Chinese and Kanji simplified Traditional Chinese character only while the other still remain the originals. Simplified Chinese are not simplified every Traditional Chinese character while in some conditions, Kanji character is more simplified than that according to Line2. Line3 and Line4 shows both Simplified Chinese and Kanji are simplified from Traditional Chinese but still remains different variants, like Line3. 2.1.2 Convert Kanji/Simplified Chinese in Machine Translation

Most of existing related works [2,3] about Simplified Chinese/Kanji conversion were by English that can convert Simplified Chinese and Kanji from not only orthographical but also semantic equivalent. And the purpose of English mediated Simplified Chinese/Kanji conversion is to expand the existing Chinese-Japanese dictionary by increasing candidates with better quality. Since most existing Chinese-Japanese dictionaries are created mediated Chinese-English and Japanese-English dictionaries, there are lots candidates of one vocabulary. After using Kanji/Chinese mapping, it can decrease number of candidates so that it is effective to improve quality of candidates. It reveals that Simplified Chinese/Kanji conversion is an effective and useful method to create Chinese-Japanese bilingual lexicon.

The other Simplified Chinese/Kanji conversion method is based on the similarity between Kanji and Simplified Chinese with a probabilistic model that connected Kanji and Simplified Chinese word by a Statistical Machine Translation (SMT) model[4]. This method revealed that it improved the quality and accuracy of Japanese-Chinese translations of technical terms. Besides, we also can extract Kanji/Simplified Chinese terms with semantic equivalence from Wikipedia. Different from machine translation mediated English, the translations of Wikipedia articles are usually created by a large number of users who may have good language skills or own rich knowledge of cross-language information.

However, all of those methods mainly make contributions to Chinese-Japanese machine translation dictionary, but cannot change the fact

Page 16: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

9

that most of Chinese-Japanese translations are still mediated English, especially some jargons, like place names.

Referring to CJK (Chinese, Japanese and Korean) Common Character Table[5, 6], I found that there were 1632 common characters of CJK while there are 85 unique Kanji characters. Because of a large number of common characters in CJK with different Unicode, we can convert Simplified Chinese and Kanji characters just via Unicode conversion. This method can make translation between Chinese and Japanese without mediating English so that it doesn’t need to build Chinese and Japanese bilingual lexicon in advance. But this Unicode based conversion cannot keep semantic equivalent between Kanji and Simplified Chinese.

Since Japanese and Chinese share the same character cultural sphere, and many related works about increasing number of Japanese-Chinese bilingual lexicon by Kanji/Chinese conversion, it provides possibility of using Kanji/Chinese conversion to support Japanese and Chinese speakers to communicate with each other and comprehend other language.

2.2 Sino-Japanese Vocabulary There are 4 lexical classes of Japanese vocabulary1: wago (和語,Japanese

words), kango (漢語, Sino-Japanese vocabularies), gairaigo (外来語, Japanese for ‘borrowed word’) and hybrid (混種語). Wago is known as Japanese proper words that did not originated from Chinese. And this type of vocabularies were created and used by Japanese without semantic character or Chinese character after Hiragana and Katakana appeared. Sino-Japanese vocabularies originated from Chinese and have been developed in Japanese, and details will be introduced later. Gairaigo also originated from other language, like English, German, French etc. but are written in Katakana mostly where Sino-Japanese vocabularies are written in Chinese character named Kanji in Japanese. Meanwhile, different from other three lexical classes, hybrid vocabularies are consisted of Japanese morpheme and other language morpheme used as etymology. For example, ‘自動車’(automobile) is made up of two different 1 http://web.mit.edu/jpnet/articles/JapaneseLanguage.html

Page 17: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

10

etymologies: English ‘auto’ that means ‘自分で’ in Japanese, Latin ‘mobilis’ that means ‘移動できる’.

Sino-Japanese, as one category of Japanese, refers to that portion of the Japanese vocabulary that originated in the Chinese language or has been created from elements borrowed from Chinese. Some grammatical or sentence patterns can also be identified with Sino-Japanese. Sino-Japanese vocabularies are one of three broad categories into which the Japanese vocabulary is divided. The others are native Japanese vocabulary and borrowings from mainly Western languages. Approximately 60% of the words contained in a modern Japanese dictionary are estimated to consist of Sino-Japanese vocabularies, and it forms about 18% of words used in speech [7].

In modern time, as a survey result by National Institute for Japanese Language and Linguistics, over 70% of all vocabularies in articles of news are Sino-Japanese vocabularies while wago occupy over 70% all vocabularies in oral communication [8]. On the other hand, there are also some surveys about proportion of those four lexical classes.

This fact of Sino-Japanese vocabularies is useful for us to classify the usage of Kanji vocabularies that are well understood by Chinese speakers.

2.3 Knowledge Transfer Communication among people with different languages is a style of

intercultural communication [9]. Usually intercultural communication is very difficult due to the barriers of different languages and cultures. Since communication is an important way to transfer related knowledge and information, it is necessary to evaluate the efficiency of knowledge transfer in intercultural communication that can confirm whether intercultural communication is successful or not. Knowledge transfer is the practical problem of transferring knowledge from one part of the organization to another.

Owing to Sinosphere, writing Chinese character always supports intercultural communication between Chinese and Japanese in our daily life. Though the pronounce of the same Chinese character is different from country

Page 18: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

11

to country, they still can communicate with each other by writing Chinese characters which is named conversation by writing. This method is widely used in the countries under Sinophere. And under conversation by writing, how people comprehend the written words by others in different culture is a way to evaluate efficiency of multilingual knowledge transfer, which will be examined in this paper from three-part conversation experiments.

2.4 Translating-Transliterating Japanese-Chinese Terms Transformation between named entities in different languages is not only

translation or transliteration. Take a Japanese city name for example, it is transliterated into the Roman alphabet firstly, but its translation name is in Kanji at last. Catching the English–Kanji relationships is difficult, except by dictionary look-up, so Japanese location names are hardly considered. The country field is used to select the translation–transliteration pairs that will be addressed in this work [10]. As Chinese-Japanese machine translation is mediated English, lots of Hanzi/Kanji terms are transliterated into English firstly, and then translated into other languages based on the translations. This method is effective to build or extend Chinese-English and Japanese-English bilingual lexicons, especially proper nouns, like location name, person names etc.

However, written Chinese and Japanese are closer to each other from characters than to English, and they also share many feature in writing and usage. It should be better and more precious to name or translate their proper nouns to each other from character conversion, not translation or transliteration, which is often used in English mediated translation. Take ‘四条’ for example, which is a Japanese location name, it is translated inaccurately to ‘市场’ in Chinese when mediated English at first. The reason is that ‘四条’, whose pronounce is ‘Shijyo’ and the same as pronounce of ‘市場’(Market) in Japanese, is firstly transliterated into English then translated into Chinese in a wrong way.

According to the rule of building new proper nouns between Japanese and Chinese mediated English, there will be different results when translating one

Page 19: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

12

Japanese proper noun to Chinese via machine translation and character conversion.

Page 20: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

13

Chapter 3 Related Literature There are two related kinds of work about this thesis: multilingual

communication and human-assisted translation. In this research, to verify and evaluate combined effectiveness of Japanese-Chinese machine translation and Kanji/Simplified Chinese conversion, we did two experiments based on those related work.

Multilingual communication is communication beyond the borders and barriers of different languages. In multilingual groups in which members’ native languages differ from each other, communication takes place in one language, requiring members to communicate in a non-native language [11]. As a global language, people from a variety of countries with different cultures can communicate with each other in English. However, language barriers still remain a significant problem when people don’t have an excellent skill of English, especially in East Asia. For these groups, machine translation is a promising tool because it would enable all members to read and write in their native language.

Based on those facts, in this research, we made English mediated multilingual or intercultural communication between Japanese and Chinese speaker to find combined effectiveness of machine translation and Kanji/Simplified Chinese conversion in Chapter4.

On the other hand, human-assisted translation1 is a translation style in which a computer system does most of the translation, appealing in case of difficulty to a (mono- or bilingual) human for help.

In the model [12], it provides an approach for monolingual speakers to make contributions to a single translation task. In this model, the participants are two monolingual people, one person who handles the source language will evaluate adequacy of modification, while the other one handling the target language, and using machine translation in order to collaboratively translate a document to modify machine translations fluently. They only use machine

1 http://www.cs.cmu.edu/~ref/mlim/chapter4.html

Page 21: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

14

translation systems to do the tasks of translations. Take an example, if a Japanese speaker and an English speaker are assigned

to translate English document to Japanese. The Japanese speaker will modify the machine translated sentences to be fluent firstly, whereas the English side can determine the adequacy of the Japanese participant corrected sentence.

This protocol provided evidence for a work procedure that one language speaker can make contributions to translation of other language’s sentence by modifying machine translations. In this research, we made modifying translation experiment with assistance of Kanji/Simplified Chinese conversion based on this approach in Charpter5.

Page 22: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

15

Chapter 4 Communication Experiment In order to explore the feasibility of Chinese characters mediated, mainly

between Simplified Chinese and Kanji, to support multilingual communication with assistance of Japanese-Chinese machine translation, we made some conversation experiments where participants communicated with each other in English, which is not their native language.

In this chapter, we propose an approach that makes use of Kanji/Simplified Chinese conversion and machine translation to support English daily conversation. This proposal is based on the fact that Chinese characters have been widely used in China and Japan in daily life, though Japanese also use Hiragana and Katakana that may be confused by people without any Japanese skills. Meanwhile, owing to Sinosphere, they can make use of Chinese character to support daily conversation in spite of different pronounces. However, the difficulty of this approach lies in the simple facts that the variants of Chinese characters are difficult to be recognized by monolingual speakers, and not all of Chinese characters written terms can be understood. Since all of Chinese terms are written in Simplified Chinese, because of variants and different usage in Japanese, it may cause misunderstanding in some conditions. At last, we will present some findings from analyzing conversation data taken from Chinese characters mediated communication and combined with machine translation mediated conversation, which is important in evaluating the efficiency of knowledge transfer in Chinese characters mediated communication and the usefulness of Simplified Chinese /Kanji conversion.

4.1 Experiment Setting Objective This conversation experiment is to observe the actions of communication

between Chinese and Japanese when talk with each in English. For both Chinese and Japanese participants have skills of Simplified Chinese and Kanji characters, Simplified Chinese or Kanji can be used in their conversations when wanting to respectively describe some words but not knowing how to say that

Page 23: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

16

in English, or making use of it as an additional introduction to English description.

So in this experiment, I plan to survey how they use Simplified Chinese characters to not only make their communication smooth and well understood, but also to evaluate comprehension of Simplified Chinese and Kanji. In order to evaluate the different methods of using Chinese characters, the conversation experiment is consisted of three parts. The purpose of the whole experiment can be outlined as follows. • Comprehension of Chinese characters

It mainly is to evaluate the comprehension barriers with different Chinese character variants, for example, including whether Japanese participants can recognize the Simplified Chinese terms written by other Chinese participants, and vice verse. Besides, comprehension of meaning converted Simplified Chinese or Kanji characters should be examined.

• Convenience of using Chinese characters The usage convenience of Simplified Chinese /Kanji conversion in the English daily conversation will be estimated from the conversation experiment with Simplified Chinese/Kanji conversion tool and interview of participants.

• Efficiency of knowledge transfer To explore whether Kanji/ Simplified Chinese conversion mediated communication was helpful to make communication smooth and well understood, the detailed condition of knowledge transfer also should be investigated and compared with machine translation mediated conversation.

Participants Two Japanese and two Chinese participated in our experiments. Since the

Japanese can’t speak Chinese, and the Chinese knows much little Japanese, they communicated with each other by English, which are not any participants’ mother languages. And the four participants are distributed into 2 groups while

Page 24: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

17

there are one Chinese and a Japanese in each group. The brief language skills of each participant are shown in Table 4. In details,

the Japanese participant A has low-level English skill with a TOEIC score lower than 750, and is not fluent in spoken English while C has a better English skill, but is still weak in speaking and listening.

On the other hand, the Chinese participant B has a higher-level English knowledge with a TOEFL score higher than 90, and is able to speak English fluently and clearly while participant D can speak English fluently but not very clearly and is not good at writing Chinese character though he is able to recognize most of them.

Table 4 Participants Description in Communication Experiment

Process There are three parts of experiments in the whole communication

experiment:

Conversation with Hand Writing

Japanese Participant� Chinese Participant�

Write Japanese�Write Simplified

Chinese�

Paper�

Fig 1 Process of Hand Writing in Conversation

In this part of experiment, when participants don’t know how to say a word in English, they can write Simplified Chinese/Kanji mainly on the paper in this

Experiment Group

Participant No. Nationality English Skill

I A Japan Low

B China High

II C Japan Medium

D China High

Page 25: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

18

experiment shownin Fig 1. When communicating with each other, each writes his/her native language and the other will see the character and then evaluate comprehension of Chinese characters written by others.

Conversation with Kanji/ Simplified Chinese Conversion

Japanese Participant� Chinese Participant�

Input Japanese�Output in

Simplified Chinese�

Conversion Tool�Output in Kanji� Input Simplified

Chinese�

Fig 2 Process of Using Conversion Tool in Conversation

In this part of experiment referred to Fig 2, we use Simplified Chinese/Kanji character conversion tool named Pinconv1 instead of writing on paper when explaining some proper nouns. Pinconv is a conversion tool that can convert Kanji to Simplified Chinese vice versa. Then we will examine both comprehensions of converted and unconverted Simplified Chinese/Kanji characters to evaluate convenience of using Chinese characters.

Expo Ibaraki Osaka Jyo Nanba Okonomiyaki Octopus Snapper�

Input Japanese Vocabularies�

Meaning�Expo Ibaraki Osaka Jyo Nanba Okonomiyaki Octopus Snapper�

Meaning�

Click Kanji-Chinese Conversion Button�

Convert�

Output Chinese Result�

Convert Kanji Part Only�

Fig 3 Function of Pinconv Kanji/Simplified Chinese Conversion

1 http://www.karak.jp/chinese/pinconv-4-00.html

Page 26: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

19

Conversation with Synchronous Communication Tool

Synchronous communication tool1 was used to support communication in this part as the process described in Fig 4. When making use of machine translation, the language resources in synchronous communication tool are supported by the multilingual languages platform named ‘Language Grid’. Language Grid is an online multilingual service-oriented platform that provides a multilingual service infrastructure, enables easy registration and sharing of languages services, such as online dictionaries, parallel texts, and machine translations[13,14]. Language Grid not only provide with Japanese-Chinese machine translation services mediated English, like Google; but also provide with services that not mediated English, like J-server. In this experiment, we used J-server as machine translator.

Japanese Participant� Chinese

Participant�

Realtime Machine Translation between Chinese and Japanese�

PC Terminal of Synchronous*Communica/on*Tool*�

Language Grid�

Input Japanese�

Input Chinese�Output Japanese

Translation�

Output Chinese Translation�

PC Terminal of Synchronous*Communica/on*Tool*�

Fig 4 Process of Using Synchronous Communication Tool in Conversation

Screen seen by Chinese Participant�

Input in Chinese�

Output in Japanese�

Translate via Language Grid�

Screen seen by Japanese Participant�

Fig 5 Screenshot of Synchronous Translation Tool

1 http://pigeon.ai.soc.i.kyoto-u.ac.jp/wotp/cd/SCT/index.html

Page 27: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

20

Since synchronous translation tool can translate Chinese to Japanese, one can input some terms in Chinese and translate it into Japanese, the other can see the translations at the same time. Fig 5 is screenshot from the experiment. The left image is words that are input by Chinese while the right is the results translated synchronously that been seen by the Japanese participant.

In this part of conversation experiment, when participants found machine translation results were too bad to explain their meaning, they can write Chinese characters instead of machine translation.

4.2 Evaluation In this section, we are going to give a comprehensive result analysis on the

experiment. According to the experiment objectives stated in Chapter4.1, three evaluation objectives will be analyzed separately.

4.2.1 Comprehension of Chinese characters

In this part, we would analysis on some data from the first and second part of conversation experiments. The analysis mainly focuses on comprehension of converted and unconverted character at statistic view.

Table 5 Comprehension and Recognition of Written Japanese and Chinese

Nationality Experiment Group

Number of Written

Vocabularies Comprehension

Rate Recognition

Rate

Chinese I 2 50.0% 50.0%

II 15 60.0% 53.3%

Japanese I 5 60.0% 60.0%

II 13 84.6% 84.6%

Table 5 is a statistic result of comprehension in the first part of conversation

experiment. Comprehension of characters means that participants can understand the meaning of those written characters by other participants while recognition of characters means that participants can recognize the written

Page 28: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

21

character no matter whether they know the meaning of those characters. In the two groups of conversation experiment, each of them were 17 and 18

Chinese vocabularies written on the paper. Without any conversion, Japanese participants could understand over half of those Chinese vocabularies only from the character, which is listed in comprehension of character. The measure of comprehension of character is to evaluate how much participants can grasp the written character, which is not their native language, only from character. However, there are still some words that can be recognized from characters without knowing the meaning of them, which can be seen in the line of comprehension of meaning. This measure means that participants could understand the meaning of written character in spite of recognizing them sufficiently.

On the other hand, as ideographic characters, no matter Chinese and it is possible for them to catch the meaning of characters that may be different from their native language. So in Group II, the Japanese participant can comprehend the written Simplified Chinese term without recognizing it. But it also reveals obviously that Chinese participants have a better skill in comprehension of Japanese terms from characters, and because of unfamiliar with Katakana and Hiragana, it may become to be difficult to understand those Japanese terms made up of Kanji and those Japanese unique characters. And those types of terms belong to wago.

As a result, we find that recognition rate nearly as same as comprehension so for most participants, if they can recognize character written in different language, they will catch the meaning of them owing to semantic weight. However, for Chinese participants, though they can recognize Japanese vocabularies that consisted of Hiragana, they still can comprehend those vocabularies because they ignore Hiragana and comprehend vocabularies only from Kanji. That is why for Chinese participant in Group II, the comprehension rate was higher than recognition of rate.

Look at Table 6, by using Hanzi/Kanji conversion tool in the second part of conversation experiments, it shows that over 70% Chinese terms and Japanese terms can be converted into the other character successfully which is shown in

Page 29: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

22

the line of conversion rate. Successful conversion means that converted character exist in the dictionary not based on others’ recognition. Compared with the first part of conversation experiment, it reveals that it is effective to improve comprehension of Chinese terms for Japanese participants by converting Hanzi character to Kanji character from 50% to 75% and 60% to 92% as a result. But because of limitation of conversion tool, it has a bad performance in converting Kanji and Hiragana mixed terms. For Chinese participants, with a good knowledge in Chinese character, it is possible for them to comprehend Hiragana or Katakana mixed terms when most of them prefer to ignore Hiragana and Katakana.

Table 6 Relationship between Comprehension of Converted Vocabularies and Conversion Rate

Nationality Experiment Group

Number of Used Converted

Vocabularies Conversion

Success Rate Comprehension

Rate

Chinese I 4 75.0% 75.0%

II 25 72.0% 92.0%

Japanese I 6 83.3% 33.3%

II 11 72.7% 63.6%

4.2.2 Convenience of using Chinese characters

We made some interviews with four participants after experiments about their feelings of using Kanji/Simplified Chinese in conversations.

Firstly, they appreciated the convenience of Kanji/Simplified Chinese conversion tool because it was easy to input and convert one character to another. For participants, they could recognize lots of Chinese characters, but it was difficult for some of them to write them down, especially Japanese participants. Though there were a lot of similarities between Kanji and Simplified Chinese, due to different levels of Chinese character, not all of them can recognize the original character without conversion.

Page 30: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

23

Secondly, compared with pronunciations, Kanji/Simplified Chinese vocabularies were much easy to be remember from character. A Chinese participant said that she preferred to remember some Japanese proper nouns by characters rather than pronunciations because her Japanese skill was not good so that it was hard to remember a long pronunciation of a proper noun. And in experiment, she often wrote down or input those vocabularies instead of talking about them.

Finally, when the machine translations results were so bad that could not be understood, many of them just input the original Simplified Chinese or Kanji vocabularies to support their communications. Because of lack of enough bilingual lexicons, machine translation was not the best choice of translating proper nouns, which often appeared in their oral communication.

4.2.3 Efficiency of knowledge transfer

In this part, we examine the efficiency of knowledge transfer. Knowledge transfer is an important and essential in intercultural communication. A scene of conversation was about Chinese participant B introducing her hometown, Xiangyang, to Japanese participant A. In the nearly 20 minutes conversation, they mainly talked about history and scenic places in China.

According to the scripts shown in Fig 6, they talked sound a famous Chinese food, Hot and Dry noodles. When the Japanese participant A was puzzled the name of food, B input the name in Simplified Chinese (热干面, hot dry noodle) and clicked translation button. At the same time, A saw the translation in Japanese but still was puzzled with Japanese translation (武漢風和えそば, noodle with Wuhan style). As the result of machine translation, what we can get from the translation is that the noodle is originated from Wuhan without any detailed description of it. In fact, the character of the famous noodles is implied from its Chinese name and English translation. Take ‘熱乾麺’ for example, in spite of different characters with Simplified Chinese, the meaning of the converted Kanji term can be understood from each character. Besides, it will also be helpful to know the details of it for ‘熱’ means hot, ‘乾’ means dry and ‘麺’ stands for noodles in English. But from results of machine translation, we

Page 31: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

24

can’t catch the important information.

Fig 6 Scripts of Conversation about Introduction of a Chinese Traditional Food

We can find that English mediated translation may lost some important even core of the original terms when translating to other language while Kanji/Hanzi conversion can keep the almost whole contractures of originals. In this way, it means that in some situations, results of Hanzi/Kanji conversion do much better in knowledge transfer and may be easier to be comprehended than machine translation when communicators are all in Sinosphere.

Then we do some comparison between machines translation and Simplified Chinese/Kanji conversion from the screenshot. As a result, we find that in the 10 Simplified Chinese vocabularies input by Chinese participant, 9 Simplified Chinese vocabularies can be translated into Japanese correctly which the Japanese participant could catch meaning of them. The only wrong translation,, is place name of a Chinese city. And in the 9 right translations, 5 translations can be converted to Kanji vocabularies from original Simplified Chinese as a result. They are mainly person names, place names, and some special terms.

We also survey those vocabularies in Wikipedia whose translations are the

B: =And the famous food there is Hot Dry Noodles.huh ((B inputs Hot Dry Noodles in Chinese and translates it into Japanese)) A: Oh::From information, I can get this is noodles. B: huh, hot and dry noodles (.) Because the source of it is= ((B inputs the name of source and translates it)) A: oI knowo B: You know it? It is the source and there is no water in it. It is (.) just noodles out of hot water and uses this source. A: Oh:: this is not soup. B: It's not soup, there is no soup. So it's called dry. A: Oh. B: Because the source is limited this source, it is very delicious. So it's famous for this.

Page 32: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

25

best to reference, and the result is as follow Table 7. It shows that 8 Simplified Chinese vocabularies can be converted into Japanese character (Kanji) directly according to the translations from Wikipedia.

From this part of experiment, we can find that perhaps it is better to convert Simplified Chinese vocabularies, which are originated from China, to Kanji terms than translation, like person name, place name, famous sites and some food. For those types of vocabularies, Japanese can understand their meanings only from character owing to Sinosphere.

It seems that since machine translation mediated should be effective to multilingual communication, for people who are under the influence of Sinosphere, Chinese character mediated also is a useful way in multilingual conversation. Even in some situations, it will be more efficient and convenient than machine translation.

Table 7 Comparison Translations Between MT and Wikipedia

Original Chinese Vocabulary Japanese MT Japanese Translation

from Wikipedia 刘备 (Liu Bei) 劉備 (Liu Bei) 劉備 (Liu Bei) 热干面 (hot and dry

noodles) 武漢風和えそば (noodle with

Wuhan style) 熱乾麺 (hot and dry

noodles) 三顾茅庐 (three visit to

the cottage) 三顧の礼をとる (show one’s

courtesy) 三顧の礼 (special

courtesy (in someone)) 护城河 (moat) 護城河 (moat) 堀 (moat) 三国(three Kingdoms) 三国 (three Kingdoms) 三国 (three Kingdoms) 襄阳 (Xiangyang) 襄陽 (Xiangyang) 襄陽 (Xiangyang) 芝麻酱 (sesame butter) ゴマ味噌 (sesame miso) 芝麻酱 (sesame butter) 隆中 (Longzhong) 盛んな中 (Flourishing) 隆中 (Longzhong) 黄鹤楼 (Yellow Crane

Tower) 黄鶴楼 (Yellow Crane Tower) 黄鶴楼 (Yellow Crane

Tower) 诸葛亮 (Zhuge Liang) 諸葛孔明 (Zhuge Kongming) 諸葛亮 (Zhuge Liang)

However, in those conversation experiments, we can also see that when

Japanese terms consisted by Kanji and other Japanese characters, Hiragana and Katakana, machine translation would be a better choice to support communication while part of those terms can be understood possibly by only

Page 33: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

26

converting Kanji to Simplified Chinese characters.

4.3 Discussion In this discussion section, we would discuss not only important results from

experiments but also disadvantages of Kanji/Simplified Chinese assisted Japanese-Chinese machine translation in intercultural communication from experiments we did. Besides, we would describe contributions of this research compared with other existing researches.

In communication experiments, we evaluated the effectiveness of Kanji/Simplified Chinese and machine translations with different methods. Though writing Chinese character or Kanji in communication between Chinese and Japanese in diary life is very common, there were fewer related researches concerned on that.

Besides, there are a large number of researches about intercultural communication between Chinese and Japanese were mediated machine translator or relative tools instead of face-to-face oral communication. In our experiments, we made intercultural communication based on face-to-face conversation by English that was not native language for both Chinese and English but their second language. Because of limitation of English skills, it was possible for them to Kanji/Simplified Chinese or machine translations to support communication.

As the results that were evaluated from three aspects in Chapter 4.2, we find that using Kanji/Simplified Chinese was effective to smooth second-language mediated conversation. Compared to machine translation, using Kanji/Simplified Chinese conversion was more convenient than using machine translation in some conditions. For example, they wanted to explain some proper nouns without knowing their corresponding English vocabularies, so they wrote down original Japanese Kanji or Simplified Chinese while owing to lack of enough bilingual lexicons, machine translations were difficult to be understood even confused.

However, there was still remaining limitation of making using of

Page 34: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

27

Kanji/Simplified Chinese, especially talking about some vocabularies consisted of Hiragana or Katakana, and some wago (Kanji vocabularies created by Japanese, different from Sino-Japanese vocabularies). Meanwhile, because of different skills of recognizing Kanji/Simplified Chinese, and different usages of vocabularies to descriptions of the same things, it also caused the hard comprehension of those converted Kanji/Simplified Chinese conversion.

Table 8 Comprehensions of Converted and Unconverted Vocabulary

Nationality Comprehension of

Unconverted Kanji/Simplified Chinese

Vocabularies

Comprehension of Converted

Kanji/Simplified Chinese Vocabularies

Chinese 14.3% 57.1%

Japanese 63.6% 60%

And referred to Table 8, using Simplified Chinese-to-Kanji conversion made

a remarkable improvement in comprehension of Chinese vocabularies whose characters are different from Japanese from 14.3% to 57.1%. However, to our surprise, there is no significant change in comprehension of converted Kanji. The reason is that those Japanese terms always are consisted of Kanji and other characters, like 甘い(Sweet), お握り(Onigiri).

Indeed, we find that though Simplified Chinese vocabularies can be successfully converted into Kanji vocabularies, and those converted Kanji vocabularies exist in Japanese dictionary, Japanese people still not understand the meaning of them because of infrequent usage. For example, 駱駝(camel) can be converted into Kanji-駱駝, but in our experiment, Japanese participant can’t recognize the Kanji character because it usually be written in Hiragana(らくだ) or Katakana(ラクダ).

Those conditions happened not only in Kanji vocabularies but also in Simplified Chinese vocabularies, though Chinese people have good skills in comprehension of Chinese characters, it is still hard for them to recognize some Kanji vocabularies, for example, 鯛(Octopus), 蛸(snapper), which are not

Page 35: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

28

widely used in daily Chinese conversation. In this case, direct Kanji/Simplified Chinese conversion did not make obvious contributions to improving understanding so that we need another approach to handle this condition, for example, add explanation to those converted terms.

From communication experiment, it was shown that in Kanji vocabularies rarely appeared in daily conversation but Kanji/Chinese conversion was effective to be used in introducing proper nouns, especially place names, and person names. However, wago often appear in conversations but those vocabularies were difficult to understand only by converting character but machine translation made better performance in translating those vocabularies.

On the other hand, not all of Chinese vocabularies can be converted into Kanji successfully so that results of both Kanji/Simplified Chinese conversion and machine translation should be combined used to support their conversation for better comprehension.

Page 36: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

29

Chapter 5 Translation Experiment In order to explore the effectiveness of Kanji/Simplified Chinese conversion

to support Japanese-Chinese machine translations of written articles, we ran an experiment where Chinese participants carried out Japanese-Chinese machine translation assisted modification activities. The reason why we chose Chinese to modify Japanese translation sentence was that from the communication experiments, it seems that Chinese participants have better skill of comprehension and recognizing converted Kanji vocabularies than Japanese.

In this chapter, we are going to give an introduction of the evaluation and analysis objectives and a setting overview of this experiment. Besides, we will also state details of procedure, participants, tasks and apparatus. After that, we will give the experiment results and analyze the results from classification and comprehension.

5.1 Experiment Objective In order to examine the combined effectiveness of Kanji/Simplified Chinese

conversion and Japanese-Chinese machine translation, we decided to evaluate and analysis results of this experiment from several aspects as follows: • Improvement of adequacy:

Firstly, in order to verify the effectiveness of using Kanji/Simplified Chinese conversion to support machine translation mediated modification, we evaluated and compared converted Kanji mediated modifications with modifications that just based on machine translations.

• Influence on comprehension: Secondly, in order to see whether and how converted Kanji vocabularies affected the participants’ modification behavior, especially their comprehension of machine translations, we observed their behavior pattern throughout their modifying translation activities and interviews

Page 37: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

30

after experiment.

• Classification of converted Kanji vocabularies: Thirdly, to find out which types of Kanji vocabularies were helpful and useful to their modifying translation activities by improving their comprehension, we investigated the frequency of used converted Kanji vocabularies in modifications compared to machine translations, and classify those vocabularies into several categories.

5.2 Experiment Design In this section, we will give an overview of the task to be allocated to Chinese

participants at first, and then give the detailed descriptions of participants, the apparatus used in this experiment; finally, we will explain the procedure design of this experiment.

Task Description In the experiment, we asked Chinese participants to join modify machine

translations activity mediated machine translator: Google Translate1. Their modification tasks were to modify some translated sentences from Japanese Wikipedia articles and Japanese news. We provide 40 phrases from Wikipedia articles 2 , including many fields, like food culture, plants and animals, architecture, metals, medicine etc. At the same time, there are also 40 sentences chose from Yahoo! News3 on Dec 12th, in which were consisted of the fields of law, weather, politics, education, sports, social and international issues, entertainment etc. For each sentence chose from each category and new article mostly, which can explain a complete meaning, always a definition in Wikipedia, and first sentences in news articles that can summarize the whole new articles, included 2 or 3 phrases.

The reason why we chose those sentences from Wikipedia and news is that we need enough Kanji vocabularies so that it is possible to analysis of 1 http://translate.google.com 2 http://ja.wikipedia.org 3 http://headlines.yahoo.co.jp/

Page 38: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

31

effectiveness of Kanji/Simplified Chinese conversion in improvement of comprehension for participants. Meanwhile, as introduction in chapter2.2, there are much more Kanji vocabularies appearing in news than in oral conversations. Besides, in some of sentences chosen, they were required a certain extent of specialized knowledge (history as well as sense of international issue) for better modification.

As an analysis and evaluation effectiveness experiment, we did not provide original Japanese sentences to participants. Specifically, we didn’t restrict time they cost for modifying translation activities so that participants could modify some difficultly understood machine translations in a free time.

Contrary to the free of time, we restricted the procedure of modifications. And except provided machine translations and converted Kanji vocabularies, participants were not allowed using any other references to support their comprehension and modifying translations tasks.

Participants Three Chinese participants were recruited for this experiment, and the

selection of participants was random but to get a compared result, they have different levels of Japanese skills.

Table 9 Participants Description of Translation Experiment

Participant No. Nationality Japanese Skill

1 Chinese None

2 Chinese High-intermediate

3 Chinese Little

Those participants were divided by their Japanese skills. Participant1 was

Chinese monolingual speakers without any Japanese skills and owned little knowledge of Japan. Participant2 had a high-intermediated Japanese skill with qualification of JLPT (Japanese Language Proficiency Test) 1 level, which is the highest level in the standard Japanese skill test. Meanwhile participant3 had

Page 39: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

32

little Japanese skill that only recognized Hiragana and Katakana, and could not take easy Japanese daily conversation with Japanese while participant2 had Japanese communication skill in business level. Because staying in Japanese for many years, both Participant2 and Participant3 had knowledge of Japan more or less, like daily life, and customs.

Apparatus According to the task description, in this experiment, the participants were

only provided with machine translations and converted Kanji vocabularies from selected sentences.

We used Google Translate to get machine translations of those 80 sentences. As Fig 7 illustrated, Google Translate can translate Japanese into Simplified Chinese in real-time. And Google Translate can translate a large number of sentences at the same time with few seconds.

Original Japanese Sentence Chinese Translation Result

Fig 7 Usage of Google Translate to Translate Japanese to Chinese

On the other hand, in order to verify effectiveness of converted Kanji vocabularies in the experiment, we also used Kanji/Simplified Chinese character conversion tool-Pinconv. Because all of participants in the experiment write in Simplified Chinese, we used Pinconv to convert Kanji into Simplified Chinese character in the experiment. With the help of Pinconv, we can just paste the original Japanese sentences, then click conversion button and it will soon convert it in Simplified Chinese sentences with Hiragana and Katakana.

However, for Hiragana and Katakana still remained in the converted sentences, it needed additional procedure to delete those Japanese unique

Page 40: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

33

characters that Chinese participant could not recognize without any Japanese skills. We removed those characters by hand, and the result of continuous sentences changed into many discontinuous Kanji vocabularies.

Procedure The experiment lasted one week when all the participants finished their

tasks. There are mainly four steps in the whole experiment: Step1 I collected every 40 Japanese sentences from Wikipedia and Yahoo!

News in varieties of categories. As the background introduced, since compared with oral communications, more Kanji vocabularies will appear in news articles, especially new Sino-Japanese vocabularies, we chose news articles as one experimental target objective. Meanwhile, we also selected some sentences of Wikipedia articles where Kanji vocabularies were frequently used. And those sentences were easier to be understood than literature and owned common knowledge between Japanese and Chinese so that when modifying those sentences of MT, it is possible for participants to rewrite based on their comprehension and related knowledge.

Step2 We did Japanese-Chinese translation and Kanji-Hanzi conversion tasks. All of 80 Japanese sentences were translated into Chinese with the assistance of Google Translate. At the same time, we converted those sentences into Hanzi character by Kanji-Hanzi character conversion tool named Pinconv.

Step3 Three Chinese participants read, modified the machine translations and got modification MT shown in Fig 8 at first. In modifying task, they mainly made the machine translation fluent and easy to be understood. Then referring to conversion results, they modified those machine translations again and got modification MT+Conversion. Indeed, they modified those machine translations twice, and if they thought it was useless to modify one machine translation better after referring to conversion results, they could write modification MT into modification MT+Conversion as well.

Page 41: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

34

Original Sentences�

Machine Translator�

Conversion Tool�

Participants�

Machine Translations�

Converted Kanji Vocabularies�

Participants�Modification MT� Modification

MT + Conversion�

Fig 8 Procedure of Modifying Translation Experiment

After modifying translation tasks, the Japanese-Chinese bilingual evaluated both modification MT and modification MT+Conversion with 5-level rank based on adequacy of modifications. We analysis this modifying translation experiment from three aspects: adequacy, comprehension and classification. The analyses based on evaluation results by two Japanese-Chinese bilinguals and modification results by Chinese participants.

5.3 Results and Analysis In this section, we are going to give a comprehensive result analysis on the

experiment. According to the experiment objectives stated in Chapter5.1, three evaluation objectives will be analyzed separately.

5.3.1 Adequacy

First, we investigated whether there was improvement in adequacy of modification with assistance of converted Kanji vocabularies. All the modifications by three Chinese participants during the experiment were collected and analyzed.

In the experiment, we provided participants with 80 machine translation results and totally 637 converted Kanji vocabularies or phrases. Finally we got 480 modified sentences in total. After Japanese-Chinese bilingual evaluated those sentences by 5-level rank, we made some statistic analysis on adequacy.

As Table 10 illustrated, compared with modification of machine translation

Page 42: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

35

results only, adequacy of modification those results with the help of converted improved both in Wikipedia and news articles. As a result, modifying Wikipedia made larger increase of adequacy than news. Meanwhile, modifying machine translations of news were more adequate than those of Wikipedia while with the assistance of converted Kanji, modifications of Wikipedia were more accurate than those of news articles.

Table 10 Adequacy of Modification MT and Modification MT+Conversion

Relationship between Adequacy and Effectiveness In this part, we focused on effectiveness of converted Kanji in improvement

of adequacy. So we evaluated the effectiveness in modification of Wikipedia and news for each participant as the following formula:

1nm

Adequacy!of!Modification!(MT+ Conversion)− Adequacy!of!Modification!MTAdequacy!of!Modification!MT

!!! (1)

In the formula, n stands for number of bilingual evaluators, in experiment, n =2; m stands for number of each participant modified sentences in Wikipedia or news, in experiment, m=40.

For each participant, we can find from Fig 9, both in modifications of Wikipedia and news, it was effective to increase adequacy by referring to conversion of Kanji vocabularies.

Besides, we find that the effectiveness of modifying Wikipedia sentences was much more higher than news sentences. There are two reasons that can be explained: one is that quality of machine translation of news articles is higher than that of Wikipedia, so that it is not necessary for participants to modify those sentences; the other reason is that in Wikipedia articles, converted Kanji vocabularies were much better to help comprehension.

Category Modification MT Modification MT+Conversion

Wikipedia 3.38 4.1

News 3.63 4.02

Page 43: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

36

Fig 9 Comparison of Effectiveness in Improvement Adequacy of Modified MT Mediated Converted Kanji in Wikipedia and News

On the other hand, since three participants own different levels of Japanese skills and Participant1 knew none of Japanese while Partcipant2 had a best Japanese skill, it was shown from Fig 9 that conversion results made most remarkable improvement of adequacy in modification by Participant1 while Participant2 made least improvement. It suggested that Kanji conversion is more effective for Chinese people with few Japanese skills.

Fig 10 Comparison of Improvement in Modifications for Each Participant in

Wikipedia and News

In Fig 10, we also focused on comparison of adequacy with converted Kanji and without that in modifying Wikipedia and news articles. With the assistance of referring to converted Kanji vocabularies, there is remarkable improvement in adequacy of modification both in Wikipedia and news articles for each

Page 44: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

37

participant. So in modifying translations of Wikipedia articles, it seems more effectiveness than in news articles.

Relationship Between Adequacy and Used Converted Kanji On the other hand, we also analyzed the relationship between adequacy and used converted Kanji vocabularies. There were more characters per one sentence of news than Wikipedia so that usage rate of converted Kanji used in modification was lower than Wikipedia. It means that in sentences of Wikipedia articles, more converted Kanji will be used in modifications where may be effective to increase participants’ comprehension

Fig 11 Frequency of Different Used Kanji Number in Wikipedia and News

Fig 11 shown frequency of number of used converted Kanji vocabularies in modifications. In every 40 modified sentences of Wikipedia and news, using converted Kanji vocabularies 4 times in modifying news sentences appeared most frequently. Meanwhile, in Wikipedia, there was no large difference of frequency of using Kanji.

The relationship between used number of converted Kanji vocabularies and adequacy was illustrated in Fig 12. To our surprise, there were no remarkable changes of adequacy with different number of used converted Kanji vocabularies. It seems that there was no direct relationship between those two

Page 45: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

38

parameters. In other words, no matter how many converted Kanji vocabularies were made use of in the modifications, adequacy of modifications did not changed obviously. Therefore, converted Kanji vocabularies can help to improve adequacy of modifying machine translations, but it did not mean that more converted Kanji vocabularies used in modification, higher the adequacy was.

Fig 12 Relationship between Difficult Used Kanji Number and Adequacy

5.3.2 Comprehension

In the part, we analyzed of influence on comprehension, mainly about how converted Kanji vocabularies impacted on participants’ comprehension so that they could make modifications better based on texts of 480 modifications.

From feedback of participants, it shown that it was difficult to understand the results of machine translation while those converted vocabularies can improve comprehension of translations and also help to modify unsuitable words in translations.

According to those feedbacks and modifications, we classified the situations of using converted Kanji vocabularies modify sentences effectively into 3 types. To make the example brief, we chose modification MT and modification MT+Conversion with highest adequacy to explain.

Page 46: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

39

Modifying Vocabularies Written in English Because of lack of bilingual dictionaries, some Japanese vocabularies were

translated into Chinese but written in English that is the original Japanese vocabulary’s pronunciation. Because of different system of languages, Japanese or Chinese proper nouns often transliterated into English by pronunciation but Japanese-Chinese proper nouns can be translated not via pronunciations but character conversion.

Table 11 Example of Modifying Vocabularies Written in English

Original Experiment Data

Original 京都市東山区の清水寺で、森清範貫主が揮毫した。

Meaning: Kanshu Kiyonori Mori wrote at Kiyomizu Temple in Higashiyama-ku, Kyoto.

Machine Translation

Kanshu Kiyonori 森林的书法在京都东山区的清水寺

Meaning: Kiyomizu Temple in Higashiyama-ku, Kyoto Kanshu Kiyonori forest was calligraphy.

Modification MT

Kanshu Kiyonori 先生在京都东山区的清水寺以书法展

示此字。

Meaning: Mr. Kanshu Kiyonori shown the character by his calligraphy at Kiyomizu Temple in Higashiyama-ku, Kyoto.

Character Conversion

京都市东山区, 清水寺,森清范贯主, 挥毫

Meaning: Higashiyama-ku Kyoto, Kiyomizu Temple, Kanshu Kiyonori Mori, write by calligraphy

Modification MT+Conversion

森清范贯主将在京都市东山区清水寺挥毫泼墨。

Meaning: Kanshu Kiyonori Mori wrote at Kiyomizu Temple in Higashiyama-ku, Kyoto.

Table 11 gave an example in this type of situation. The proper noun ‘森清範貫主’ (Kanshu Kiyonori Mori) was translated into Chinese with alphabet. Because of this proper noun included person name and his position title which not existed in bilingual lexicons, via English, the translation results turned to be

Page 47: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

40

written in English not Chinese. So when participants seeing the alphabet words only, they could not modify

it into corresponding Chinese words, and according to their comprehension they regarded ‘Kanshu Kiyonori’ as a person name that in fact was not correct. But, after they saw the provided converted vocabularies ‘森清范贯主’ without any hints of that converted vocabularies corresponding to the alphabet vocabularies, they could successfully modify those alphabet with the Simplified Chinese vocabularies.

From the detailed example, we can find that it shows the possibility to mapping bad translated vocabularies, like alphabet vocabularies, with corresponding provided converted vocabularies for Chinese participants no matter whether he or she has any Japanese skills.

Rewrite Wordless Sentences Owing to low precision of English mediated Japanese-Chinese machine

translation, in many conditions, the results were too bad to be understood for Chinese participants. Because it was difficult for them to catch the key points of those wordless translations, the adequacy of their modifications was not too high. However, those provided converted Kanji vocabularies can help to catch the rough meaning and based on participants’ knowledge, they rewrite those sentences. And Table 12 shown an example in this condition.

As machine translations may lose some information of originals, and when modifying those wordless sentences, it may cause misunderstanding. In the example, there was misunderstanding and error in modification MT where added wrong information ‘Chinese’ based on their comprehension of machine translation result. With the assistance of converted Kanji vocabularies, especially ‘中国’ (China), they referred and corrected their modification in modification MT+Conversion.

Page 48: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

41

Table 12 Example of Modification of Wordless Sentences

Original Experiment Data

Original

室町時代以前に、中国を経由して日本に入ったと考え

られている

Meaning: It had been in Japan via China before Muromachi era.

Machine Translation

我以前认为的室町时代,通过中国赴日本

Meaning: I (pass) through China the time of the town which thought my before and go to Japan

Modification MT

我以前认为在室町时代,中国人赶赴日本

Meaning: I thought previously Chinese went to Japan at Muromachi era.

Character Conversion

室町时代以前, 中国, 经由, 日本, 入

Meaning: before Muromachi, China, via, Japan, enter

Modification MT+Conversion

在室町时代之前经由中国流入日本

Meaning: It came into Japan through China before Muromachi era.

Modify Inadequate Vocabularies In some conditions, machine translation can not keep translated words

consistent. It was possible to change those inadequate or inconsistent words according to context with provided converted Kanji or modify them based on their comprehension.

As Table 13 above shown, the machine translation result was nearly excellent so that participants did not change them in their modification MT. But according to provided character conversion, the participant corrected ‘星球上’ (on the star) to ‘地球上’(on the earth) referred to the converted vocabularies which was a better description according to their knowledge and context.

Page 49: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

42

Table 13 Example of Modification of Inadequate Vocabularies

Original Experiment Data

Original 南極大陸は、地球上で最も寒冷な地域の一つであり...

Meaning: Antarctica is one of the coldest areas on the earth...

Machine Translation

南极洲是这个星球上最寒冷的地区之一...

Meaning: Antarctica is one in the coldest areas on this star...

Modification MT

南极洲是这个星球上最寒冷的地区之一

Meaning: Antarctica is one in the coldest areas on this star...

Character Conversion

南极大陆, 地球上, 最, 寒冷, 地域, 一

Meaning: Antarctica, on the earth, most, cold, area, one

Modification MT+Conversion

南极洲是地球上最寒冷的地域之一

Meaning: Antarctica is one in the coldest areas on the earth...

5.3.3 Classification In order to find out which types of Kanji vocabularies were helpful and

useful to their modifying translation activities by improving their comprehension, we investigated the frequency of used converted Kanji vocabularies in modifications compared to machine translations, and classify those vocabularies into several categories.

In those 637 converted vocabularies and phrases, we classified them into the following categories based on parts of speech:

Noun, Verb, Adverb, Adjective, Numeral, Quantifier and Date

As a statistic result, number of noun, verb occupied almost 90% of provided

Page 50: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

43

converted Kanji vocabularies. That seems that in Japanese sentences, more nouns and verbs were written in Kanji characters than other parts of speech. Meanwhile, we also found that numeral, quantifier and date appeared in the sentences, especially in news articles. But taken into objective of our research into account, we did not analyze those kinds of vocabularies in improvement.

Besides, after investigating used converted Kanji in modifications by three participants, we also classified those vocabularies into those categories:

Table 14 Categories of Frequently Used Converted Vocabularies in Modification

Category Usage Rate Explanation

Noun

Proper Nouns 93.7% (74/79) Street, organization, person name etc.

Old Sino-Japanese

Nouns 86.8%

(105/121) Originated from China, mostly appear

in Wikipedia

New Sino-Japanese

Nouns 87.3%

(131/150) Created by Japanese, part of modern

Chinese, mostly appear in news

Gerund 75.6% (34/45) Verbal nouns

Adjective 68.0% (17/25) Less frequency but easily understood

As Table 14 illustrated, we used usage rate to evaluate those classified

vocabularies successfully used in modification. The used vocabularies stand for provided vocabularies that used in modification MT or modification MT+Conversion without changing into other vocabularies based their comprehension. Those vocabularies mean that with character conversion, both Japanese and Chinese can understand. In other words, it can be named as ‘common vocabularies’.

From the table, we found that vocabularies in those categories had high usage rate when modifying machine translations, especially proper nouns. It suggests that for Chinese participants, it is a more effective way to understand Kanji-made proper nouns from character conversion directly than see the machine translation results. And we also got the similar results in

Page 51: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

44

communication experiment in Chapter4. On the other hand, both old Sino-Japanese nouns and new Sino-Japanese

could be well comprehended from the results. Meanwhile, there were more old Sino-Japanese nouns in Wikipedia articles while new Sino-Japanese nouns often appeared in news article. Compared with the different adequacy between Wikipedia and news, we can find that machine translation perform better in translation of new Sino-Japanese nouns than old Sino-Japanese nouns so that Chinese participants made improvement more remarkable in modifying Wikipedia than news by referring to converted Kanji vocabularies. Though usage rates of new and old Sino-Japanese nouns are almost the same, there is still difference for usage conditions. Converted old Sino-Japanese nouns often were used in modification MT+Conversion while converted new Sino-Japanese nouns appear in modification MT frequently and participants did not correct those nouns in modification MT+Convesion. Based on those findings, we can make a conclusion that machine translation made better performance in new Sino-Japanese nouns while Kanji/Simplified Chinese conversion is more helpful in old Sino-Japanese nouns.

Gerund and adjective also will be frequently used in modification while those vocabularies are always consisted of two Kanji characters.

5.4 Discussion In modifying translations experiments, Chinese participants no matter how

Japanese skills they owned, joined and modified those machine translations. In the related works about human-assisted translations or collaborative translations, participants should modify machine translations many times without knowing by themselves which parts of modifications should be corrected for there were no key hints provided. What they modified was that machine translations of original sentences and modified originals by other participants.

With analysis of adequacy, comprehension and classification, we found that combination of machine translation and Kanji/Simplified Chinese conversion was effective for Chinese to understand and modify those wordless sentences

Page 52: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

45

since for each participant, because adequacy of modification MT+Conversion was higher than modification MT. Besides, by analyzing frequency of used converted Kanji/Simplified Chinese vocabularies, we classified those vocabularies into three large categories which was helpful to analysis of how Kanji or Simplified Chinese were used in improvement of comprehension, and verified that combination of Kanji/Simplified Chinese conversion and machine translation was useful in modifications and comprehension.

However, from the results of experiments, there were still some limitations and shortcomings of using Kanji/Simplified Chinese conversion when modifying more complex sentences. And in some modified sentences, though it was more precious than modifications without referring to the provided converted Kanji vocabularies, there still were some incorrect parts in one’s modified sentences while those incorrect parts may be modified better by others.

So it suggests that those modified translation sentences will be much better with crowdsourcing that the latter participants can refer to the modification results by former participants when doing their own modification tasks.

Page 53: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

46

Chapter 6 Conclusion In this thesis, we analyzed and reported on the study of combined

effectiveness of Kanji/Simplified Chinese conversion and Japanese-Chinese machine translation by making intercultural communication experiment and human-assisted translation experiment. Since there are many variants of one Chinese character, Kanji/Simplified Chinese conversion enables Japanese and Chinese to break through the barrier when using characters of their mother tongues.

In our experiments, we observed human behaviors, especially usage of Kanji/Simplified Chinese conversion in multilingual communication, and analyzed scripts of conversation and modifications to find how Kanji/Simplified Chinese conversion supported participants to comprehension machine translation better.

The conclusions of this research are argued as below: Firstly, both in intercultural communication and human-assisted translation

experiment, Kanji or Simplified Chinese conversion made remarkable contributions to improve their comprehension of machine translations or others’ words. In conversation experiment, Simplified Chinese and Kanji as an intermediation in daily English conversation is effective and useful to improve the comprehension and to make the conversation much more smooth and favorable, but there are a large number of terms owning completely different even reverse meaning in Chinese and Japanese. Though Chinese can recognize Kanji characters, it is not easy for them to write them down for most of they use Simplified Chinese. However, for Japanese people, it is hard to recognize Simplified Chinese. In this way, Kanji/Simplified conversion tool is important.

Secondly, using Kanji/Simplified Chinese conversion was successful to help understand proper nouns. From the conversation experiment, over 70% proper nouns originated from Chinese could be converted to Japanese only from characters while nearly 50% proper nouns were used in modifications. Those proper nouns include place names, person names, and traditional food names. In the comprehension of some Chinese jargons, Kanji/Simlified conversion

Page 54: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

47

mediated was as effective as machine translation in knowledge transfer. Thirdly, converted Sino-Japanese vocabularies directly were convenient for

Japanese and Chinese to understand others’ meaning. As a result in modifying translation experiment, converted Sino-Japanese vocabularies mediated can improve adequacy of machine translation and it is helpful to comprehend and modify the translations. Referred to the converted Sino-Japanese vocabularies, it is effective to make the translations better in usage of terms and fluency of sentence. Participants can modify some incorrect terms in machine translations based on provided Sino-Japanese vocabularies. With their knowledge, they also can change provided Sino-Japanese vocabularies to more general used in Chinese or even make them more elegant.

Finally, machine translation is a useful tool to make the shortcomings and limitations of Kanji/Simplified Chinese conversion when translating Hiragana or Katakana consisted vocabularies and others. Therefore, combined machine translation and character conversion was a better way to solve problem of low precision of Japanese-Chinese machine translation currently.

Above all, there is an opportunity for people in Sinosphere to use Chinese character to support their communications or comprehension of others’ languages. However, with the intercultural knowledge and complexity of Japanese characters, in some conditions, machine translation is also necessary to reduce communicative efforts. Combination of the functions of Chinese characters conversion and machine translation is verified to be successful and effective in intercultural communication and human-assisted translations in this research. System combined those function will be expected to support Japanese-Chinese intercultural collaborations in the near future.

Page 55: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

48

Acknowledgments First of all, I would like to express my gratitude to my supervisor and mentor,

Professor Toru Ishida at Kyoto University, for his continuous guidance and the opportunity to conduct this research.

My special gratitude also goes to my advisers, Associate Professor Qiang Ma and Tomohiro Kuroda, who have contributed in unique ways to the construction and improvement of this work.

I am profoundly thankful to Lecturer Rieko Inaba for her careful advice and help in this research.

I also give my deep appreciation to all the faculty members and students of Ishida and Matsubara Laboratory for their pertinent advices and discussions.

Finally I want to express my thanks to all the members in Ishida & Matsubara Laboratory for their kind help and support.

Page 56: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

49

References [1] Ya-Min Chou, Shu-Kai Hsieh, and Chu-Ren Huang, Hanzi Grid toward a

Knowledge Infrastructure for Chinese Character-Based Cultures. Proceedings of the 1st international conference on Intercultural collaboration, pp.133-145, 2007.

[2] Chooi-Ling Goh, Masayuki Asahara, Yuji Matsumoto. Building a Japanese-Chinese dictionary using kanji/hanzi conversion. Proceedings of IJCNLP, pp.670~681, 2005.

[3] Yujie Zhang, Qing Ma, Hitoshi Isahara. Use of Kanji Information in Constructing a Japanese-Chinese Bilingual Lexicon. The 4th workshop on Asian Language Resource, 2004.

[4] XiaoLiu, Takashi Tsunakawa, Naoaki Okazaki, Jun’ichi Tsuji. Analyzing Kanji-Hanzi Mappings by Aligning Translation Equivalents. Transactions of Information Processing Research Report of Japan, Natural Language Processing Research Report 2008(113), p.85-90, 2008.

[5] Common Character Table of CJK, http://140.111.1.40/fulu/fu5/fu6.htm [6] Jack Halpern. Lexicon-based Orthographic Disambiguation in CJK

Intelligent Information Retrieval. Proceedings of the 3rd workshop on Asian language resources and international standardization, 2002.

[7] Shibatani, Masayoshi. The Languages of Japan, p.142, Cambridge University Press, 1990.

[8] Kokuritsu Kokugo Kenkyuujo. Terebi Hoosoo no Goi Choosa 1, Shuuei Publishing, 1995.

[9] Naomi Yamashita, Reiko Inaba, Hideaki Kuzuoka and Toru Ishida. Difficulties in Establishing Common Ground in Multiparty Groups using Machine Translation. International Conference on Human Factors in Computing Systems, pp. 679-688, 2009.

[10] Hsin-Hsi Chen, Wen-Cheng Lin, and Changhua Yang. Translating-Transliterating Named Entities for Multilingual Information Access. Journal of the American Society for Inforamtion Science and Technology, 57(5):645–659, 2006.

Page 57: Analysis of the Effects of Japanese-Chinese …ai.soc.i.kyoto-u.ac.jp/publications/thesis/M_H24_jin-nan.pdfMaster Thesis Analysis of the Effects of Japanese-Chinese Machine Translation

50

[11] Yamashita, N., Ishida, T. Automatic Prediction of Misconceptions in Multilingual Computer-Mediated Communication. Proceedings of the International Conference on Intelligent User Interfaces, pp. 62–69, 2006.

[12] Daisuke Morita and Toru Ishida. Collaborative Translation by Monolinguals with Machine Translators. Proceeding of the 14th International Conference on Intelligent User Interfaces, pp.361-366, 2009.

[13] Toru Ishida. Language Grid: An Infrastructure for Intercultural Collaboration. IEEE/IPSJ Symposium on Applications and the Internet (SAINT 2006), pp.96-100, keynote address, 2006.

[14] Toru Ishida. The Language Grid: Service-Oriented Collective Intelligence for Language Resource Interoperability, p.3-18, Springer, 2011.