Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Natural Language Processing
Budditha Hettige
Department of Computer Engineering
Machine Translation
Overview
• What is Machine Translation?
• History
• Approaches
• Existing Machine translation systems
3NLP-MT 2020
Machine Translation
• Computer software that translates text or
speech from one natural language to another
• Sub field of Artificial Intelligence (AI) in the
area of Computer Science
• is a way of converting “the meaning” of one
language into others through a software
program
• Machine Translation gives a potential
solution for language barrier
NLP-MT 2020 4
Pipeline of MT
NLP-MT 2020 5
Machine Translation Pyramid
NLP-MT 2020 6
History
NLP-MT 2020 7
Machine Translation
History
NLP-MT 2020 8
History
• In 1948, “dictionary look-up system” at “Birkbeck College, London”
• 1948, Booth and Richens introduce a dictionary lookup procedure to handle machine translation
• The first machine translation conference was held in 1952 at the MIT
• A word-for-word machine translation system for Russian text into English was introduced by the Perry at MIT in 1952
• In 1958, the first practical MT system (Russian text into English) was implemented by the IBM to US Airforce under the direction of “Gilbet King”
NLP-MT 2020 9
History
• After 1970, “SYSTRAN” implemented a new
Russian-English MT system
• In 1980, computer-aided translations were the
most successful approach for MT, especially for
Japanese- English
• After 1980 Corpus- based machine translation
approach is introduced
• Neural machine translation was first introduced by
Google in 2016
NLP-MT 2020 10
TimeLine
NLP-MT 2020 11
Approaches
NLP-MT 2020 12
Approaches to MT
NLP-MT 2020 13
Interlingua Approach
• Language-independent meaning representation for
the source language to target language translation
• Easier to add a new language
• Meaning representation of the source language is
difficult
• If source language is more complex, then
generation will be too difficult
• Requires all the levels of language analysis– Morphological
– Syntactical
– Semantical
– Pragmatic
NLP-MT 2020 14
Flow of the interlingua MT
NLP-MT 2020 15
Interlingua Systems
• UNITRAN (Translate among English, Spanish, and
German)
• ICENT - A Chinese-English MT system
• English to Arabic machine translation
• English-Hindi interlingua-based machine
translation system
NLP-MT 2020 16
Human-Assisted
• Uses human interaction for the pre editing, post
editing and/or intermediate editing stages
• Uses human support for the semantic handling
in the machine translation
• Humans and machines co-operate is more
success than others
• Systems
– Anusaaraka
– ManTra
– MaTra
NLP-MT 2020 17
Human-Assisted
• considered as a semi-automated machine
translation system
• Much popular for low resource languages
• Human interaction for the “pre-editing”,
“post-editing” and/or “intermediate editing”
stages
• CAT tools
– OmegaT
– Anglabharthi
NLP-MT 2020 18
Dictionary-based MT
• One of the early approaches to machine translation
• Systems give attention to word level
• Systems should be capable of handling morphology
• Is based on word-by-word (word level) translations
• Approach is more accurate on languages that are closely related
• Performance of the dictionary-based translation can be enhanced by introducing the source language morphological analyser and target language morphological generator
NLP-MT 2020 19
Dictionary based MT
NLP-MT 2020 20
Rule-based MT
• Classical approach for MT
• Based on linguistic information about the source and target languages
• Uses a set of language specific rules to provide grammatically correct translations
• RBMT system contains– Source language morphological analyzer
– Source language parser
– Source to target translator
– Target language composer
– Target language morphological generator
– Lexicon dictionaries
NLP-MT 2020 21
Architecture of the RBMT
NLP-MT 2020 22
Systems
• Apertium
• Toshiba
• BEES (English to Sinhala)
NLP-MT 2020 23
Statistical Approach
• Most studied MT approach
• Generates translations using statistical
methods through the bilingual text resources
• Systems
– Moses
– Babel Fish
– Bing Translator
– Google Translator
NLP-MT 2020 24
Activity on Statistical MT
NLP-MT 2020 25
Neural Machine Translation
• A successful approach to machine
translation
• Uses machine learning concepts
• Language models
– recurrent neural language model
– feed-forward neural language model
– long short-term memory models
– deep models
– neural translation models
NLP-MT 2020 26
Neural Machine Translation
• Google’s Neural Machine Translation
• TensorFlow’s Neural Machine Translation
• Sequence-to-sequence model
• Encoder-decoder architecture
NLP-MT 2020 27
Issues in Machine Translation
• Word and Sentence Segmentation
• Word Conjugation
• Tense Detection
• Multi-word Expression
• Out of Vocabulary
• Translating Idiomatic Phrases
NLP-MT 2020 28
Existing
Machine Translation
Systems
NLP-MT 2020 29
Anusaaraka System
NLP-MT 2020 30
• Makes text in one Indian
language accessible to another
Indian language
• System uses Paninian
Grammar model to its language
analysis
• Developed to translate Punjabi,
Bengali, Telugu, Kannada and
Marathi languages into Hindi
• English-Hindi Anusaaraka
translates English text into Hindi
• URL: http://anusaaraka.iiit.ac.in
Apertium
NLP-MT 2020 31
• Rule-based Machine Translation system
• Apertium engine follows a shallow transfer approach
• consists of the eight pipelined modules– de-formatter,
– A morphological analyzer,
– A parts-of-speech tagger
– A lexical transfer module,
– A structural transfer module
– A morphological generator
– A post-generator, and
– A re-formatter
Yahoo Bable fish
• Uses Statistical
approach
NLP-MT 2020 32
Google Translator
• Statistical
approach and
Neural Machine
Translation
NLP-MT 2020 33
Systrans
• Statistical
NLP-MT 2020 34
Bing Translator
NLP-MT 2020 35
Questions
• Answer the following questions
a) Briefly describe the pipeline of the machine translation.
b) Briefly describe Rule-based approach for machine translation
c) Explain how dictionary based machine translation can be improved through the source language morphological analysis.
d) By consider Rule-based machine translation approach, briefly explain the source language understanding steps on the following English sentence.
"The good boy reads a new book."
NLP-MT 2020 36