View
216
Download
0
Category
Preview:
Citation preview
MT with an Interlingua
Lori Levin
April 13, 2009
Interlingua
• “An interlingua is a notation for representing the content of a text that abstracts away from the characteristics of the language itself and focuses on the meaning (semantics) alone.
• Interlinguas are typically used as pivot representations in machine translation, allowing the contents of a source text to be generated in many different target languages.
• Due to the complexities involved, few interlinguas are more than demonstration prototypes, and only one has been used in a commercial MT system.”– Dorr, Hovy, Levin, Natural Language Processing and Machine
Translation, Encyclopedia of Language and Linguistics, 2nd ed. (ELL2). Machine Translation: Interlingual Methods
KANT: If the error persists, service is required (Mitamura and Nyberg)
• (*BE-PREDICATE (attribute (*REQUIRED (degree positive))) (mood declarative) (predicate-role attribute) (punctuation period) (qualification (*QUALIFYING-EVENT (event (*PERSIST (argument-class theme) (mood declarative) (tense present) (theme (*ERROR (number (:OR mass singular)) (reference definite))))) (extent (*CONJ-if)) (topic +))) (tense present) (theme (*SERVICE (number (:OR mass singular)) (reference no-reference))))
NESPOLE! (Levin et al.)
• “I want to know what time the flight leaves Pittsburgh.”
• “What time does the flight leave Pittsburgh?”
• request-information+departure (time = (clock = question), transportation-spec = (flight, id = yes), origin = name=Pittsburgh)
Not Just for MT anymore
InterlinguaLanguageAnalysis
LanguageSynthesis
Cross-languageInformation
Retrieval
Cross-languageSummarization
Machine Translation
MultilingualQuestion
Answering
Interlingua
SemanticStructure
SemanticStructure
SyntacticStructure
SyntacticStructure
WordStructure
WordStructure
Source Text Target Text
SemanticComposition
SemanticDecomposition
SemanticAnalysis
SemanticGeneration
SyntacticAnalysis
SyntacticGeneration
MorphologicalAnalysis
MorphologicalGeneration
SemanticTransfer
SyntacticTransfer
Direct
Vauquois Triangle
Reasons for using an interlingua
• N2 vs 2N– For all-ways translation between N languages, you
need an analyzer (L to interlingua) and a synthesizer (interlingua to L) for each language.
• Monolingual development teams– Each developer needs to know only his/her language
and the interlingua.– NESPOLE! project: Italian to Korean translation
worked as well as Italian to English, even though nobody on the team was bilingual in Korean and Italian.
– Same may be true for SMT?
MT Divergences• Translating word-by-word, node-by-node, or
dependency-by-dependency does not work.– Mi chiamo Lori – My name is Lori– to be jealous — tener celos (to have jealousy)– to kick — dar una patada (give a kick)– to enter the house — entrar en la casa (enter in the
house)– to run in — entrar corriendo (enter running)– meet someone/meet with someone– decide/make a decision
• Which of these are handled well by phrase-based SMT or syntax based SMT (with or without morphology – dar, doy, etc.)?
Interlingua Example: KANT• (*BE-PREDICATE
(attribute (*REQUIRED (degree positive))) (mood declarative) (predicate-role attribute) (punctuation period) (qualification (*QUALIFYING-EVENT (event (*PERSIST (argument-class theme) (mood declarative) (tense present) (theme (*ERROR (number (:OR mass singular)) (reference definite))))) (extent (*CONJ-if)) (topic +))) (tense present) (theme (*SERVICE (number (:OR mass singular)) (reference no-reference))))
Interlingua Example: NESPOLE!
• “I want to know what time the flight leaves Pittsburgh.”
• “What time does the flight leave Pittsburgh?”
• request-information+departure (time = (clock = question), transportation-spec = (flight, id = yes), origin = name=Pittsburgh)
Interlingua Example: Mikrokosmos
request-action-69 agent human-72 theme accept-70 beneficiary organization-71 source-root-word ask time (< (find-anchor-time)) accept-70 theme war-73 theme-of request-action-69 source-root-word authorize organization-71 has-name united-nations beneficiary-of request-action-69 source-root-word UN human-72 has-name colin powell agent-of request-action-69 source-root-word he ; ref. resolution has been carried out war-73 theme-of accept-70 source-root-word war
Interlingua Example: Lexical Conceptual Structure
• (event cause (thing[agent] reporter+) (go loc (thing[theme] email+) (path to loc (thing email+) (position at loc (thing email+) (thing[goal] aljazeera+))) (manner send+ingly)))
• Figure 10: LCS Representation of The reporter emailed Al-Jazeera
Issues in Interlingua design
• Grainsize of meaning• Domain specificity of meaning• Ambiguity• Lack of agreement among humans• From EACL workshop 2009:
– Russell-Frege: Meaning can be broken down in to pieces that combine logically.
– Witgenstein-Quine: Meaning = use. • Use is represented by a corpus
Interlingua: annotated corpora
• Many annotated corpora can be considered as part of an interlingua:– Named entities and co-reference– Semantic roles– Temporal expression
IAMTC: Interlingua Annotation of Multi-lingual Text Corpora
• 14 PI’s. One year (2003-2004). Still publishing.
• See other set of slides.
Elicitation Corpus
• 3000 feature structures
• English sentence for each one.
• LDC translated the English sentences into 13 languages and a few other places did a few more languages.
SCALE 2009: MT and HIVEs• High Information Value Elements
– Named entities, negation, modality
• Urdu to English• Modality
– H firmly believes [R is true/false] – H believes [R may be true/false] – H requires [R to be true/false] – H permits [R to be true/false] – H intends [to make R true/false] – H does not intend [to make R true/false] – H is trying [to make R true/false] – H is able [to make R true/false and succeeds] – H is able [to make R true/false and fails] – H is able [to make R true/false]– H wants [R to be true/false]
Recommended