34
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate Calculus Human involvement Historical note

Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Embed Size (px)

Citation preview

Page 1: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Week 9: resources for globalisation

Finish spell checkers Machine Translation (MT)

The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate

Calculus Human involvement Historical note

Page 2: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Spelling dictionaries Implementing spelling identification

and correction algorithm

Page 3: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Spelling dictionaries Implementing spelling identification and

correction algorithm STAGE 1: compare each string in document with a

list of legal strings; if no corresponding string in list mark as misspelled

STAGE 2: generate list of candidates Apply any single transformation to the typo string Filter the list by checking against a dictionary

STAGE 3: assign probability values to each candidate in the list

STAGE 4: select best candidate

Page 4: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Spelling dictionaries STAGE 3

prior probability given all the words in English, is this candidate more

likely to be what the typist meant than that candidate? P(c) = c/N where N is the number of words in a corpus

likelihood Given, the possible errors, or transformation, how likely

is it that error y has operated on candidate x to produce the typo?

P(t/c), calculated using a corpus of errors, or transformations

Bayesian rule: get the product of the prior probability and the

likelihood P(c) X P(t/c)

Page 5: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Spelling dictionaries non-word errors Implementing spelling identification

and correction algorithm STAGE 1: identify misspelled words STAGE 2: generate list of candidates STAGE 3a: rank candidates for probability STAGE 3b: select best candidate Implement:

noisy channel model Bayesian Rule

Page 6: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Resoucres for Globalisation:Machine translation

Page 7: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Resoucres for Globalisation:Machine translation

The ‘decoding’ paradigm Assumes one-to-one relation between

source symbol and target symbol

Page 8: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Resoucres for Globalisation:Machine translation

The ‘decoding’ paradigm Assumes one-to-one relation between

source symbol and target symbol one-to-many (homonymy)

Page 9: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Resoucres for Globalisation:Machine translation

The ‘decoding’ paradigm Assumes one-to-one relation between

source symbol and target symbol one-to-many (homonymy) one-to-many (hypernym →

hyponyms):

Page 10: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Resoucres for Globalisation:Machine translation

The ‘decoding’ paradigm Assumes one-to-one relation between

source symbol and target symbol one-to-many (homonymy) one-to-many (hypernym →

hyponyms): many-to-one (hyponyms → hypernym)

Page 11: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation

The ‘decoding’ paradigm one-to-many (homonymy)

bank → Ufer, Bank (German)

Page 12: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation

The ‘decoding’ paradigm one-to-many (homonymy) one-to-many (hypernym →

hyponyms): brother → otooto, oniisan (Japanese) blue → синий, голубой (Russian)

many-to-one (hyponyms → hypernym)

Page 13: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation

The ‘decoding’ paradigm one-to-many (homonymy) one-to-many (hypernym →

hyponyms): many-to-one (hyponyms → hypernym)

hill, mountain → Berg (German) learn, teach → leren (Dutch)

Page 14: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation and globalisation

Ambiguity‘I made her duck’

“The possibility of interpreting an expression in two or more distinct ways”

Collins English Dictionary

Page 15: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation Ambiguity

Challenge of the translation depends on the level of ambiguity that arises

This depends on the closeness of the source and target languages w.r.t. the following:

vocabulary homonyms

grammar structural ambiguity

conceptual structure specificity ambiguity lexical gaps

Page 16: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation

Pragmatic approach

Page 17: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation

Pragmatic approach aim for a rough translation, ‘gist’

translation Used for multi-lingual information

retrieval

Page 18: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation

Pragmatic approach aim for a rough translation, ‘gist’

translation Used for multi-lingual information

retrieval involve human translators in the

process:computer-aided translation

Page 19: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation

Translation models Transfer model ‘the dog bit my friend’

Hindi: kutte-ne mere dost ko-kata dog my friend bit

Page 20: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation

Translation models Transfer model

Alter grammatical structure of source language to make it adhere to the grammatical structure of target language

Use transformation rule Analysis process (source) Transfer process (‘bridge’) Generation process (target) Problem: each source-target pair will need it own

unique set of transformation rules

Page 21: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation

Translation models Inter-lingua model

Extract the meaning from the source string Give it a language independent

representation, i.e. an interlingua Translation process takes the interlingua as

its input Multiple translation processes take the same

input for multiple target language outputs

Page 22: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation

Translation models What is the inter-lingua?

for words, some sort of semantic analysis,

e.g. (GO, BY-FOOT) (GO, BY-TRANSPORT)Russian: идти ехать

English: go go

Page 23: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation and globalisation

Translation models What is the inter-lingua?

for sentences, a logical languagee.g. First Order Predicate Calculus

Page 24: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Meaning representation  Goal:

1. the semantic representation must give you a one-to-one mapping to non-linguistic knowledge of the world 2. The representation must be expressive, i.e. handle different types of data

Page 25: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Meaning representation  First Order Predicate Calculus

computationally tractable objects (terms) properties of objects relations amongst objects

Predicate argument structure large composite representations

logical connectives

Page 26: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Meaning representation  First Order Predicate Calculus

Object: referred to uniquely by a term constant e.g. SurreyUniversity function e.g. LocationOf(SurreyUniversity) variable

Page 27: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Meaning representation  First Order Predicate Calculus

Relations amongst objects Predicates:

“symbols that refer to, or name, the relations that hold among some fixed number of objects” (J & M)

Educates(SurreyUniversity, Citizens) two-place predicate

Page 28: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Meaning representation  First Order Predicate Calculus

Relations amongst objects Predicates: Can specify the category of an object

University(SurreyUniversity) one-place predicate

Page 29: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Meaning representation  First Order Predicate Calculus

properties / parts of objects functions:

LocationOf(SurreyUniversity)

Page 30: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Meaning representation  First Order Predicate Calculus

Composite representations through predicates and functions:Near(LocationOf(SurreyUniversity), LocationOf(Cathedral))

Page 31: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Meaning representation  First Order Predicate Calculus

Logical connectives combine basic representations to form

larger more complex representationse.g ٨ operator = ‘and’

Page 32: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Meaning representation  First Order Predicate Calculus

Logical connectives combine basic representations to form larger

more complex representationsEducates(SurreyUniversity, Citizens) ٨ ¬ Remunerates(SurreyUniversity, Staff)

Page 33: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Machine translation and globalisation

  Machine translation and globalisation: change of

priorities 1954: IBM and Georgetown University, first MT demo

goal: ‘perfect’ translation 1967: Automatic Language Process Advisory Committee

(ALPAC) report: damning of goal Post ALPAC

Goal: rough translation, involve human element Current situation: online translation, e.g. Babel Fish,

descendant of SYSTRAN whose goal was rough translation Journal of Machine Translation

Page 34: Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and

Next week

  Globalisation as an industry SDL and the SDLX-TRADOS

globalisation application