Upload
lovey
View
38
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Developing affordable technologies for resource-poor languages. Ariadna Font Llitjós Language Technologies Institute Carnegie Mellon University September 22, 2004. dot = language. Motivation. Resource-poor scenarios - PowerPoint PPT Presentation
Citation preview
Developing affordable technologies for resource-poor
languages
Ariadna Font Llitjós
Language Technologies Institute
Carnegie Mellon University
September 22, 2004
October 11, 2002 AMTA 2002 2
dot = language
October 11, 2002 AMTA 2002 3
MotivationResource-poor scenarios- Indigenous communities have difficult access
to crucial information that directly affects their life (such as land laws, health warnings, etc.)
- Formalize a potentially endangered language
Affordable technologies, such as- spell-checkers, - on-line dictionaries, - Machine Translation (MT) systems, - computer assisted tutoring
October 11, 2002 AMTA 2002 4
AVENUE PartnersLanguage Country Institutions
Mapudungun
(in place)
Chile Universidad de la Frontera, Institute for Indigenous Studies,
Ministry of Education
Quechua
(started)
Peru Ministry of Education
Iñupiaq
(discussion)
US (Alaska) Ilisagvik College, Barrow school district, Alaska Rural Systemic Initiative, Trans-Arctic and Antarctic Institute, Alaska Native Language Center
Siona
(discussion)
Colombia OAS-CICAD, Plante, Department of the Interior
October 11, 2002 AMTA 2002 5
ChileOfficial Language: SpanishPopulation: ~15 million
~1/2 million Mapuche people
Language: Mapudungun
Mapudungun for the Mapuche
October 11, 2002 AMTA 2002 6
What’s Machine Translation (MT)?
Japanesesentence Swahili
sentence
October 11, 2002 AMTA 2002 7
Speech to Speech MT
October 11, 2002 AMTA 2002 8
Why Machine Translation for resource-poor (indigenous) languages?
• Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers)
• Benefits include:– Better government access to indigenous communities
(Epidemics, crop failures, etc.)– Better indigenous communities participation in
information-rich activities (health care, education, government) without giving up their languages.
– Language preservation– Civilian and military applications (disaster relief)
October 11, 2002 AMTA 2002 9
MT for resource-poor languages: Challenges
• Minimal amount of parallel text (oral tradition)• Possibly competing standards for
orthography/spelling• Often relatively few trained linguists• Access to native informants possible• Need to minimize development time and cost
October 11, 2002 AMTA 2002 10
Interlingua
Transfer rules
Corpus-based methodsanalysis
interpretation
generation
I saw you Yo vi tú
Machine Translation Pyramid
October 11, 2002 AMTA 2002 11
AVENUE MT system overview
\spa Una mujer se quedó en casa\map Kie domo mlewey ruka mew\eng One woman stayed at home.
{VP,3}
VP::VP : [VP NP] -> [VP NP]
( (X1::Y1) (X2::Y2)
((x2 case) = acc)
((x0 obj) = x2)
((x0 agr) = (x1 agr))
(y2 == (y0 obj))
((y0 tense) = (x0 tense))
((y0 agr) = (y1 agr)))
V::V |: [stayed] -> [quedó]
((X1::Y1)
((x0 form) = stay)
((x0 actform) = stayed)
((x0 tense) = past-pp)
((y0 agr pers) = 3)
((y0 agr num) = sg))
Interactive and Automatic Refinement of Translation Rules
Or: How to recycle corrections of MT
output back into the MT system by adjusting and adapting
the grammar and lexical rules
October 11, 2002 AMTA 2002 15
Error correction by non-expert bilingual users
October 11, 2002 AMTA 2002 16
Interactive elicitation of MT errorsAssumptions:• non-expert bilingual users can reliably detect
and minimally correct MT errors, given:– SL sentence (I saw you)– TL sentence (Yo vi tú)– word-to-word alignments (I-yo, saw-vi, you-tú)– (context)
• using an online GUI: the Translation Correction Tool (TCTool)
Goal: • simplify MT correction task maximally
October 11, 2002 AMTA 2002 17
TranslationCorrection
Tool
Actions:
October 11, 2002 AMTA 2002 18
SL + best TL picked by user
October 11, 2002 AMTA 2002 20
Changing “grande” into “gran”
October 11, 2002 AMTA 2002 21
October 11, 2002 AMTA 2002 22
October 11, 2002 AMTA 2002 23
Automatic Rule Refinement Framework
• Find best RR operations given a:• grammar (G), • lexicon (L), • (set of) source language sentence(s) (SL), • (set of) target language sentence(s) (TL), • its parse tree (P), and • minimal correction of TL (TL’)
such that TQ2 > TQ1• Which can also be expressed as:
max TQ(TL|TL’,P,SL,RR(G,L))
October 11, 2002 AMTA 2002 24
Types of RR operations
• Grammar:– R0 R0 + R1 [=R0’ + contr] Cov[R0] Cov[R0,R1]– R0 R1 [=R0 + constr] Cov[R0] Cov[R1]– R0 R1[=R0 + constr= -]
R2[=R0’ + constr=c +] Cov[R0] Cov[R1,R2]
• Lexicon– Lex0 Lex0 + Lex1[=Lex0 + constr] – Lex0 Lex1[=Lex0 + constr]– Lex0 Lex1[Lex0 + TLword] Lex1 (adding lexical item)
October 11, 2002 AMTA 2002 25
Questions & Discussion
Thanks!
October 11, 2002 AMTA 2002 26
Formalizing Error Information
Wi = error
Wi’ = correction
Wc = clue word
Example:
SL: the red car - TL: *el auto roja TL’: el auto rojo
Wi = roja Wi’ = rojo Wc = auto
October 11, 2002 AMTA 2002 27
Finding Triggering Features
Once we have user’s correction (Wi’), we can compare it with Wi at the feature level and find which is the triggering feature.
If set is empty, need to postulate a new binary feature
Delta function: