The past, present and future of Machine Translation

Embed Size (px)

Citation preview

  • 8/22/2019 The past, present and future of Machine Translation

    1/9

    The past, present and future of Machine Translation

    Micha Szewczyk, University of Warsaw

    Abstract.

    The aim of the following paper is to investigate the current condition as well as possible future

    development of Machine Translation (MT). An attempt is made to answer the question if, and to

    what extent, computers could replace humans in the process of translation.

    The paper begins with a concise outline of the history of machine translation. The following section

    provides an analysis of the main approaches to MT, as well as of the problems that are likely to be

    encountered in the proccess, based on comparison of human and machine translation of the selected

    passages. Having conducted the analysis, the author highlights the prospects of machine translation

    and ponders on its capacity to support and possibly replace human translators in their work.

    I. Introduction

    Machine Translation, also refered to by the abbreviation MT, is a field of computational

    linguistics concerned with the use of computers in the proccess of translating messages from one

    natural language to another. Over more then sixty years of its development, it has been subject to

    scientific ventures and heated debates by linguists, computer scientists, engineers, psychologists and

    philosophers. Its begginings trace back to Cold War, when the competition between the United

    States and the USSR created the need for excessive translation of documents from English to

    Russian and vice versa. Since then, MT has been developed with the aim of facilitating and

    accelerating the work of human translators and even eliminating human factor from the process of

    translation in the future.

    The author of this paper attempts to answer the question whether or not such an objective is

    attainable and what may be expected of MT based on its history and the present status. Since the

    contemporary civilisation is based on information and the need for cross-cultural, international

    cooperation between people is greater then ever, this issue seems to be particulary interesting and

    relevant.

    II. Historical background

    Modern Machine Translation has begun in the second half of the XXth century, when the

    global political situation and the onset of Cold War stimulated the nessesity of translating vast

    number of documents between English and Russian languages. The starting point of MT is often

  • 8/22/2019 The past, present and future of Machine Translation

    2/9

    considered to be the so-called Weaver memorandum of 1949. A letter written by Warren Weaver,

    who then was a vice president of Rockefeler Foundation was distributed among people potentially

    interested in the developement of MT. Although it had a predominantly strategic meaning, it

    covered several important methodological and theoretical problems, such as the question of

    ambiguity, logical rudiments of language and analysis of linguistic universals. The memorandum

    was an impulse to begin research over MT in a number of American universities. As a result, the

    first scientific conference was held in 1952 and two years later a public demonstration of machine

    translation took place. The event, known as Georgetown experiment, involved fully automated

    translation of more than sixty statements on a variety of subjects from Russian into English. The

    experiment was successful and proved influential, since it encouraged the government to allocate

    money in the field of computational linguistics and stimulated research in MT outside the United

    States, notably in the USSR.The research intensified in the 1950s as well as in the first half of the following decade.

    However, the initial enthusiasm decreased over time since exceeding the standards established in

    the Georgetown experiment proved unexpectedly problematic. The quality of fully automated

    translations lowered along with expanding vocabulary and the set of grammatical rules.

    Furthermore, the researchers encountered problems concerning word choice in case of multiple

    meaning and dealing with ambiguous semantic structures. Bar Hillel, one of the most influential

    figures in MT field at that time, recommended combining automated translation with human post-

    editing and although this policy has been implemented in many on-going projects, MT faced

    increasing criticism.

    Finally, in 1966 the National Academy of Sciences Automatic Language Processing

    Advisory Committee (ALPAC) published a raport critical of current advancements of MT. The

    raport recommended limiting the financial support for further research. As it was pointed out, the

    cost of machine supported translation was higher then it would be in case of purely human

    translation.

    The report caused American efforts in MT to be greatly reduced for the following fifteen

    years. Nevertheless it still has been developed in other countries, which resulted in successful

    projects such as TAUM-METEO, which translated weather reports from English into French, or

    SYSTRAN.

    The field of machine translation gradually revived in the United States and all over the

    world and in the 1990s it became one of the most vital domains of computational linguistics.

  • 8/22/2019 The past, present and future of Machine Translation

    3/9

    III. Strategies and approaches to Machine Translation

    Since the beginnings of Machine Translation interfered with the dawn of modern computer

    era, MT pioneers had to surmount numerous obstacles of purely technical nature. The first

    objective was to divise an automated bilingual dictionary suitable formachines of largely limited

    storage and computational capacity. One of the methods of reducing dictionary size involved the

    division of words into stems and endigs, so as not to include all the inflected forms of nouns and

    verbs. This triggered the first systematic morphological research for the purpose of translation, yet

    the analysis quckly proved overly complex for some languages, such as German or Russian. Thus,

    initially, automated dictionaries contained all the inflected word forms.

    This forced researchers to adopt simplified strategies of machine translation, most notably

    the word-for-word approach. It involved finding the equivalents of the Source Language words in

    the Target Language and substitution without taking morphological analysis or the word order into

    account. Obviously, this method was not expected to produce coherent or even comprehensible

    translations. However, it may be useful in translation of long lists of phrases, such as short

    catalogues or inventories.

    Although it is possible to devise procedures of basic stuctural analysis and rearrangement of

    word within the dictionary-based systems, producing high quality translation requires the

    ivestigation of the phrase and clause relationships. Another serious problem is polysemy, as wordmay have multiple meanings in the TL, depending on context. Some words may function as

    different parts of speech with no formal distinction, e. g. the English word record. Also, some

    languages make more subtle distinctions in meaning than others, e. g. the English verb to know may

    be translated either assavoirorconnaitre into French.

    The syntactic issues are especially stressed in the classic, rule based approach to MT . In

    this method, sentence structures of SL and TL are represented by two different sets of rules and

    another set contains the rules of relating the two structures together. First, all the words are

    identified as proper parts of speech. The next step is to retrive specific syntactic information

    concerning the verb and its possible phrasal contexts and to parse the sentence by assigning each

    word to a proper phrase. Finally, the words in the sentence are translated, mapped on the syntactic

    structure relevant for the TL and inflected.

    The rule-based MT divides into two different subtypes: Transfer Based and Interlingua MT.

    Transfer systems permit contextual substitution SL lexical units with those of TL, which is possible

    as a result of a syntactic analysis. The interlingua systems aim at representing the meaning of a

    source text by means of an artificial and unambiguous formal language. The meaning is then

    rendered through syntactic structures and lexical units of the Target Language. Since extracting the

  • 8/22/2019 The past, present and future of Machine Translation

    4/9

    deep meaning from a natural language text is complex both in technological and empirical terms,

    few large-scale interlingua projects have been completed. However, most new Transfer based

    systems tend to be interlingual by nature and handle semantic problems with the use of dictionaries

    containing disambiguation information, rather than purely syntactic analysis.

    Disambiguation, one of the key problems in the field of Machine Translation, often requires

    extratextual knowledge of how the world functions. Knowledge based MT systems are an attempt

    to implement such information in the forms of conceptual trees or networks and divising algorithms

    supposed to select the appropriate candidates. Such systems are based on assumtion that the

    traditional syntactic methods do not solve a certain class of problems, thus syntax issues are solved

    by means of semantic discription.

    In statistical and example based systems, translations are generated on the basis of large text

    corpra, which serve as sources for deriving the parameters of statistical models. A sentence in a textis translated according to the probabilty rate that a string in the TL is a translation of this sentence.

    Such systems are cost-effective and do not requre manual implementation of rules, while being

    largely independent of a language pair chosen. Generated translations tend to be more natural if the

    available corpus is sufficiently vast to contain a close equivalent of a given sentence. In example

    based systems words are translated as their inexact matches in the TL (e.g synonymous or

    hyponymous expressions).

    IV.Machine versus Human Translation

    According to Jiri Levy, (Levy 1967) translation is a decisional process and the choice of

    lexical units (as well as higher-level units) is governed by a system of concious and subconcious

    instructions. The instructions are both objective, dependant on the linguistic material (semantic,

    rythmic, stylistic etc.) and subjective, such as the structure of the translator's memory and their

    aesthetic norms. A unit may be chosen from a set of potential candidates (a paradigm) on the basis

    of such factors as the potential meanings of a word, different conceptions of a character's style or

    stylistic and philosophical preferences of the author.

    As it may be observed, human translators have a wide spectrum of arbitrary and extratextual

    infirmation at their disposal. As it was shown in the previous section, providing such information to

    machines as well as processing it may be complex and challenging in many aspects. In the

    following section, the author is going to compare human and machine translations of several

    fragments of a literary text (Slaughterhouse Five by Kurt Vonnegut) and a fragment of a techical

    text. The comparison will be followed by a short analysis of the problems that are likely to occur in

    the process of machine translation.

  • 8/22/2019 The past, present and future of Machine Translation

    5/9

    Original passage (1):

    So I held up my right hand and I made her a promise. ''Mary,'' I said, ''I don't think this book is ever going to

    be finished. I must have written five thousand pages by now, and thrown them all away. If I ever do finish it,

    though, I give you my word of honor: there won't be a part for Frank Sinatra or John Wayne.

    ''I tell you what,'' I said, ''I'll call it The Children's Crusade.''

    She was my friend after that. (Vonnegut 1969)

    Human Translation:

    Podniosem wic praw rk i zoyem jej obietnic: Mary powiedziaem. Nie sdz, aby

    kiedykolwiek udao mi si skoczy t ksik. Napisaem ju chyba z pi tysicy stron i wszystko wyrzuciem.

    Jeeli jednak kiedykolwiek j skocz, to daj ci sowo honoru, e nie bdzie w niej roli dla Franka Sinatry ani

    Johna Wayne'a. Wiesz, co ci powiem? Dam jej tytu Krucjata dziecica.

    Od tej chwili bylimy przyjacimi. (Jczmyk 1972)

    Translation by Translatica

    Wic uniosem swoj praw rk i zoyem przyrzeczenie jej. '' Maria, '' powiedziaem '' nie myl, e ta ksika

    zamierza zosta skoczonym kiedykolwiek. Musiaem napisa pi tysic stron ju, i wyrzucony ich wszystkich.

    Jeli kiedykolwiek kocz to, jednak, daj ci swoje sowo honoru: tam nie by czci dla Frank Sinatra albo

    Jana Wayne. '' Ja wiesz co? '' powiedziaem '' nazw to Krucjat Dzieci. '' bya moj przyjacik po tym.

    (Translatica)

    The machine translation of the fragment is intelligible, although imperfect in many aspects.

    For instance, the noun book has been incorrectly identified as the agent in the second sentence and

    the verb thrown has been treated as an adjective. The translation involves mistakes concerning word

    order, inflection and tenses, although the message is not gravely disrupted. Thus, the simple text has

    been translated successfully and requires minor post-editing to reach an acceptable quality level.

    Original passage (2):

    'Close it up and keep it closed!'' Roland Weary warned Billy Pilgrim as they moved out. Weary looked

    like Tweedledum or Tweedledee, all bundled up for battle. He was short and thick. (Vonnegut 1969)

    Human translation:

    Morda na kdk! ostrzeg Roland Weary Billy'ego. Weary wyglda jak Kubu

    Puchatek wyruszajcy na wojn. By niski i gruby. (Jczmyk 1972)

  • 8/22/2019 The past, present and future of Machine Translation

    6/9

    Translation by Translatica:

    '' Zamyka to i trzyma to zamknity! '' Roland Weary ostrzeg Billy Pi lgri m poniewa wyprowadzil i si.Znuony wyglda jak Tweedledum albo Tweedledee, wszystko zebrao w plik dla bitwy. By krtki itpy. (Translatica)

    The second passage is a good example of how cultural awareness may affect translation. The

    human translator decided to remove the names of the fictional characters from the English language

    nursery rhyme (Tweedledum and Tweedledee) and replace them with another character of children

    fiction, possibly better known to a Polish reader. The machine translator ignored the names, which

    was acceptable, but it failed to establish the correct reference of words in two cases (the pronoun 'it'

    as refering to the character's mouth and 'bundled up' as a phrasal verb refering to Roland Weary).

    Furthermore, the word 'Weary' was incorrectly recognised as an adjective the socond time, due to its

    sentence-initial position. The capital letter was identified as an indicator of a new sentence, rather

    then of a proper name. The problems with inflection (ostrzeg Billy Pilgrim, Znuony wyglda)

    suggest that the machine had problems with parsing the fragment and establishing the correct

    syntactic structure.

    Original Passage (3):

    I had two books with me, which I'd meant to read on the plane. One was Words for the Wind, by

    Theodore Roethke, and this is what I found in there:

    I wake to steep, and take my waking slow.

    I feet my late in what I cannot fear.

    I learn by going where I have to go. (Vonnegut 1969)

    Human Translation:

    Wiozem ze sob dwie ksiki, ktre miaem zamiar czyta w samolocie. Jedn z nich byy Sowa na wiatr

    Teodora Roethke i oto co w niej znalazem:

    Budz si, aby ni, i wkraczam w sen powoli.

    Tam, gdzie strach si nie czai, szukam przeznaczenia.

    Idc, ucz si drogi, ktr zmierza musz. (Jczmyk 1972)

    Translation by Tumacz Komputerowy:

    Miaem dwie ksiki ze mn, ktry zamierzaem przeczyta w samolocie. Jeden by

    S o wa dla Wiatru , przez Theodorea Roethkea i to jest co znalazem w tym:

    Budz si z m ocz y i w zi m j b u dz cy si p o w ol ny. Ja st o py m j p n o w czy m ni e m o g b a si .

    Ucz si przez chodzenie , gdzie musz pj.

    The third fragment exposes the inability of machine translators to deal with highly figurative poeticallanguage. The two initial sentences written in prose manage to convey the intended message, despite

    containing errors similar to those already discussed. However, the poem has clearly been mistranslated

  • 8/22/2019 The past, present and future of Machine Translation

    7/9

    by the program. The verb to steep has been treated as a separate infinitive rather then a part of a Verb

    Phrase, which suggests considerable problems with parsing the verse. The verb to take has been

    rendered by its one-to-one equivalent in Polish, despite the fact that it is used by the poet in the sense of

    'proceeding'. Simlarily, the words feet and late, which have been used outside of their regular contexts

    and became different parts of speech, assume their literal meaning in the automated translation. Solely

    the last verse of the poem, which is prose by nature, has been rendered correctly.

    In contrast, the human translator managed to retrive the underlaying meaning correctly, whilerendering the poetic style and rhythm.

    The following sentence comes from the manual of a digital camera. Such sources normally

    operate language that is simple in terms of structure and syntax and employ specialized, yet

    unambiguous vocabulary. Presumably, such sentences should be translated successfully by most

    machine translators.

    Original Passage:

    Firing the flash too close to the subject's eyes could cause a momentary loss of vision.

    Human Translation:

    Z ad zi a ani e la mp y b y s k ow ej z a b li s k o o c zu o s o b y f o t og r af ow a n ej mo e s p ow od o w a c h w il ow ut rat

    wzroku.

    Tumacz komputerowy

    Rozpalanie {strzelanie, wyrzucanie} b y s ku r w ni e bl is k o o c zu t em at u {przedmiotu} mo g o b y

    s p ow od o wa c hwil ow utrat wzr oku .

    Google Translate:

    Lampy byskowej zbyt blisko oczu fotografowanej osoby moe spowodowa chwilow utrat

    wzroku.

    This sentence eploys several ambiguous words (e.g firing, flash, subject). Each of the two machine

    translators adopt a different approach to the question of disambiguation. The comuper translating

    program, which is a rule-based system, provides alternative translations which are supposed to help

    the user arrive at the intended meaning. Clearly, the application is not suited with specializedvocabulary, yet it is still possible to understand the message. The translation provided by the

    Google's on-line tool is surprisingly close to the one produced by a human. Since Google Translate

    is a statistic-based MT system, it processes millions of man-written texts in search for patterns. The

    above phrase is conventional and likely to occur in a large number of documents. Thus, a statistic-

    based MT system quickly recognises it as a pattern.

    V. Can machines replace human translators in the future?

  • 8/22/2019 The past, present and future of Machine Translation

    8/9

    On the basis of the above analysis, one may observe that Machine Translation is at present a largely

    immature technology. Most MT systems are focused on predominantly syntactic analysis and fail to

    consider extratextual factors or multi-sentence meaning in the process of translation. Many systems

    are overly dependent on pre-specified kinds of texts and developing broad-domain knowledge

    sources is still overly complex. On the other hand, the statistic based systems are completely

    dependant on size and quality of the available corpora. For some language pairs obtaining a

    reasonable corpus is difficult, thus the quality of translations may be significantly lower.

    The analysis has shown that machines do not create a real competition for profesional

    translators as far as literary and poetical texts are concerned. However, MT systems might be

    adequate for translating technical documentation, specialized publications of a restricted domain

    and materials not meant for publication. Over time, with the advancement of modern technologies

    such as speech recognition, MT might as well become an important means of cross-cultural andinternational communication between people.

    Translators all over the world may appreciate the help of machine translators in the

    process of creating rough versions of translations. With their ever-growing corpora MT systems are

    increasingly effective tools to support the work of human translators.

    IV. References

    1. Nirenburg, S. Wilks, Y. Machine Trans lat ion . PDF.

    2. Mic hine Trans lation 2 4 . 2 ( 2 0 1 0 ) : 6 7 - 6 9 . Spr ingerL ink . 5 Sept. 2010. Web. 16 Jan. 2011.

    3. Fied er er, R. O'Brien, S. "Quality an d Machine Translati on: A Realistic Obje ctiv e?" The

    J o ur nal o f Sp e c ialized Tran s lat ion 1 1 ( 2 009 ) : 5 2 - 7 4 . W eb .

    4. Ji Le v . "Prze k a d Jak o Pro c e s Pod ej m o w a ni a De cyz ji." Wsp c z e s n e Teo r i e P r z e k adu .

    Ed. Piotr Bukowski Magda Heydel. Krakw: Znak, 2009. 72-85. Print.

    5 . Osb o r n e, M. "MT Hist o r y a n d Rul e-b a s ed Syst e ms. " Lec tu r e. 9 Jan. 2 0 09 . Web. 8 Jan . 2 0 1 1 .

    .6. "Statistical Machine Translation." Wik ip ed i a, t h e F r e e En cy c l o p ed i a. We b. 1 6 Jan. 2 0 1 1 .

    .

    7. In s ide Go ogl e Tran s lat e . G oogl e Tran s la t e. Web. 16 Jan. 2011. .

    8. W d o o w s k i, P. Gra bs k i, B. T um a cz I S o wn ik J zyk a A n gi el ski e g o . V e r s . 4 . 2 0 0 7 .

    C o mp u t er so f t w a r e.

    9. Translator. C o m p u t e r s o f t w a r e . Tumacz , Tran s l at or , B iuro Tumacze P wn . pl , S o wni k

    A ngi el ski , Ni e mi e cki , R o syj ski, P o l s ki . Web. 1 6 Jan. 2 0 1 1. < http://w w w.translatica.pl/ > .

    10. Hutchins, W. John. "Chapter 3: Problems, Methods, and Strategies." Ma chine Tran s la tion :

  • 8/22/2019 The past, present and future of Machine Translation

    9/9

    Pa s t , P r e s en t , Fu ture . Chichester [West Sussex: Ellis Horwood, 1986. Print.

    11. Vonnegut, Kurt. Rz e ni a Nu m e r P i . Warszaw a: Pa stwo wy Instytut Wydaw niczy, 1 9 7 2.

    Print. Przeoy: Lech Jczmyk

    12. Vonnegut, Kurt. Slaughterhouse Five. NY: Dell, 1991. Print.

    13. Google Translate. Web. 16 Jan. 2011. .