SLT 1997 FrederkiTranslation Memory Engines: A Look under the Hood and Road Testng

  • Upload
    ivan

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

  • 8/10/2019 SLT 1997 FrederkiTranslation Memory Engines: A Look under the Hood and Road Testng

    1/6

    I n t e r a c ti v e S p e e c h T r a n s l a ti o nin t h e D I P L O M AT P r o j ec t

    Rober t Frederking, Alexander Rudnicky, and Chris topher Hogan{ r e f , a i r c h o g a n } c s c m u . e d uLanguage Technologies Ins t i tu te

    Carnegie Mel lon Univers i tyPittsburgh, PA 15213

    A b s t r a c t

    The DIPLOMAT rapid-deploymentspeech translati on system is intended toallow naive users to communicate acrossa language barrier, without strong do-

    main restrictions, despite the error-prone nature of current speech andtranslation technologies. Achieving thisambitious goal depends in large parton allowing the users to interactivelycorrect recognition and translation er-rors. We briefly present the Multi-Engine Machine Translation (MEMT)architecture, describing how it is well-suited for such an application. We thendescribe our incorporation of interac-tive error correction t hrou ghout t he sys-tem design. We have al ready developeda working bidirectional Serbo-Croatian

    English system, and are currently de-veloping Haitian-Creole ~ English andKorean ~ English versions.

    1 I n t r o d u c t i o n

    The DIPLOMAT project is designed to explorethe feasibility of creating rapid-deployment, wear-able bi-directional speech translation systems. By"rapid-deployment", we mean being able to de-velop an MT system that performs initial trans-lations at a useful level of quality between a newlanguage and English within a matter of days or

    weeks, with continual, graceful improvement toa good level of quality over a period of months.The speech understanding component used is theSPHINX II HMM-based speaker-independent con-tinuous speech r ecognition system (Huang e l a l . ,1992; Ravishankar, 1996), with techniques forrapidly developing acoustic and language modelsfor new languages (Rudnicky, 1995). The ma-chine translation (MT) technology is the Multi-Engine Machine Translation (MEMT) architec-ture (Frederking and Nirenburg, 1994), describedfurther below. The speech synthes s component is

    a newly-developed concatenative system (Lenzo,1997) based on variable-sized compositional units.This use of subword concatenation is especiallyimportant, since it is the only currently avail-able method for rapidly bringing up synthesis fora new language. DIPLOMAT thus involves re-

    search in MT, speech understanding and synthe-sis, interface design, as well as wearable computersystems. While beginning our investigations intonew semi-automatic techniques for both speechand MT knowledge-base development, we have al-ready produced an initial bidirectional system forSerbo-Croatian ~ English speech trans lati on inless than a month, and are currently developingHaitian-Creole ~ English and Korean ~ Englishsystems.

    A major concern in the design of theDIPLOMAT system has been to cope with theerror-prone nature of both current speech under-standing and MT technology, to produce an ap-plication that is usable by non-translators with asmall amount of training. We attempt to achievethis primarily through user interaction: whereverfeasible, the user is presented with intermediateresults, and allowed to correct them. In this pa-per, we will briefly describe the machine trans-lation architecture used in DIPLOMAT (showinghow it is well-suited for interactive user correc-tion), describe our approach to rapid-deploymentspeech recognition and then discuss our approachto interactive user correction of errors in the over-all system.

    2 M u l t i - E n g i n e M a c h i n eTr a n s l a t i o n

    Different MT technologies exhibit differentstr engt hs and weaknesses. Technologies such asKnowledge-Based MT (KBMT) can provide high-quality, ful ly-aut omated translations in narrow,well-defined domains (Mitamura e l a l . , 1991; Far-well and Wilks, 1991). Other technologies such aslexical-transfer MT (Nirenburg e t a l . , 1995; Fred-erking and Brown, 1996; MacDonald, 1963), andExample-Based MT (EBMT) (Brown, 1996; Na-

    61

  • 8/10/2019 SLT 1997 FrederkiTranslation Memory Engines: A Look under the Hood and Road Testng

    2/6

    g a o , 1 9 84 : S a t o a n d N a g a o , 1 9 9 0 ) p r o v i d e lo w e r-q u a l i t y g e n e r a l - p u r p o s e t r a n s l a t i o n s , u n l es s t h e ya r e i n c o r p o r a t e d i n t o h u m a n - a s s i s t e d M T s y s t e m s( F r e d e r k i n g et a l . ,1 9 9 3 ; M e l b y, 1 9 8 3 ) , b u t c a n b eu s e d in n o n - d o m a i n - r e s t r i c t e d t r a n s l a t i o n a p p l ic a -t i o n s . M o r e o v e r, th e s e t e c h n o l o g i e s d i ff e r n o t j u s ti n t h e q u a l i t y o f t h e i r t r a n s l a t i o n s , a n d l e v e l o f

    d o m a i n - d e p e n d e n c e , b u t a l s o a l o n g o t h e r d i m e n -s io n s , s u c h a s t y p e s o f e r r o r s t h e y m a k e , r e q u i r e dd e v e l o p m e n t t im e , c o s t o f d e v e l o p m e n t , a n d a b i l-i t y t o e a s i ly m a k e u s e o f a n y a v a i l a b l e o n - l i n ec o r p o r a , s u c h a s e l e c t r o n i c d i c t i o n a r i e s o r o n l i n eb i l i n g u a l p a r a l l e l t e x t s .

    T h e M u l t i - E n g i n e M a c h i n e T r a n s l a t i o n( M E M T ) a r c h i t e c tu r e ( F r e d e r k i n g an d N i r en b u r g ,1 9 9 4 ) m a k e s i t p o s s i b l e t o e x p l o i t t h e d i f f e r e n c e sb e t w e e n M T t e c h n o l o g ie s . A s sh o w n i n F i g u r e 1 ,M E M T f e e d s a n i n p u t t e x t t o s e v e r a l M T e n g i n e si n p a r a l l e l , w i t h e a c h e n g i n e e m p l o y i n g a d i ff e r -e n t M T t e c h n o l o g y 1 . E a c h e n g i n e a t t e m p t s t ot r a n s l a t e t h e e n t i r e i n p u t t e x t , s e g m e n t i n geachs e n t e n c e i n w h a t e v e r m a n n e r is m o s t a p p r o p r i -a t e f o r i t s t e c h n o l o g y, a n d p u t t i n g t h e r e s u l t i n gt r a n s l a t e d o u t p u t s e g m e n t s i n t o a s h a r e d c h a r td a t a s t r u c t u r e ( K a y, 1 9 6 7 ; W i n o g r a d , 1 9 8 3) a f -t e r g i v i n g e a c h s e g m e n t a s c o re i n d i c a t i n g t h e e n -g i n e ' s i n t e r n a l a s s e s s m e n t o f t h e q u a l i t y o f th eo u t p u t s e g m e n t . T h e s e o u t p u ttarget language)s e g m e n t s a r e i n d e x e d i n t h e c h a r t b a s e d o n t h ep o s i t i o n s o f t h e c o r r e s p o n d i n g i n p u tsource lan-guage) s e g m e n t s . T h u s t h e c h a r t c o n t a i n s m u l t i-p l e, p o s s i b l y o v e r l a p p i n g , a l t e r n a t i v e t r a n s l a t io n s .S i n c e t h e s c o re s p r o d u c e d b y t h e e n g i n e s a r e e s ti -m a t e s o f v a r ia b l e a c c u r a c y, w e u s e s t a t i s t i c a l l a n -g u a g e m o d e l l i n g t e c h n iq u e s a d a p t e d f r o m s p ee c hr e c o g n i t i o n r e s e a r c h t o s e l e c t t h e b e s t o v e r a l l s e to f o u t p u t s ( B r o w n a n d F r e d e r k i n g , 1 9 9 5 ; F r e d e r k -i n g , 1 99 4 ) . T h e s e s e l e c ti o n t e c h n i q u e s a t t e m p t t op r o d u c e t h e b e s t o v e r a l l r e s u l t, t a k i n g t h e p r o b a -b i l it y o f t r a n s i t i o n s b e t w e e n s e g m e n t s i n t o a c c o u n ta s w e l l a s m o d i f y i n g t h e q u a l i t y s c o r e s o f i n d i v id -u a l s e g m e n t s .

    D i f f e r e n c e s i n t h e d e v e l o p m e n t t i m e s a n d c o s t so f d i ff e r e n t . t e c h n o lo g i e s c a n b e e x p l o i t e d t o e n -a b l e M T s y s t e m s t o b e r a p i d l y d e p l o y e d fo r n e wl a n g u a g e s ( F r e d e r k i n g a n d B r o w n , 1 9 9 6 ). I f p a r -a l le l c o r p o r a a r e a v a i l a b l e f o r a n e w l a n g u a g e p a i r ,t h e E B M T e n g i n e c a n p r o v i d e t r a n s l a t i o n s f o r a

    n e w l a n g u a g e in a m a t t e r o f h o u r s . K n o w l e d g e -b a s e s f o r l e x i c a l - tr a n s f e r M T c a n b e d e v e l o p e d i na m a t t e r o f d a y s o r w e e k s ; t h o s e f o r s t r u c t u r a l -t r a n sf e r M T m a y t a k e m o n t h s o r y e a rs . T h eh i g h e r - q u a l i t y, h i g h e r - i n v e s t m e n t K B M T- s t y l e e n -g i n e t y p i c a l l y r e q u ir e s o v e r a y e a r t o b r i n g o n -l in e . T h e u s e o f t h e M E M T a r c h i t e c t u r e a l lo w st h e i m p r o v e m e n t o f i n i t ia l M T e n g i n e s a n d t h e

    1Morpho logical analysis , p art-of-spe ech tagging,and poss ib ly o ther tex t enhancements can be sharedby the engines.

    a d d i t i o n o f n e w e n g i n e s t o o c c u r w i t h i n a n u n -c h a n g i n g f r a m e w o r k . T h e o n l y c h a n g e th a t t h eu s e r s e es is t h a t t h e q u a l i t y o f t r a n s l a t i o n i m -p r o v e s o v e r ti m e . T h i s a l lo w s in t e r f a c e s to r e -m a i n s t a b l e , p r e v e n t i n g a n y n e e d f o r r e tr a i n i n go f u s e r s, o r r e d e s i g n o f i n t e r - o p e r a t i n g s o f t w a r e .T h e E B M T a n d L e x i c a l- T r a n sf e r -b a s e d M T t r a n s-

    l a t i o n e n g i n e s u s ed i n D I P L O M AT a r e d e s c r i b e de l s e w h e r e ( F r e d e r k i n g a n d B r o w n , 1 9 9 6 ) .

    F o r t h e p u r p o s e s o f t h i s p a p e r , t h e m o s t i m p o r -t a n t a s p e c t s o f t h e M E M T a r c h i t e c t u r e a re :

    the ini t ial ly deployed versions are quite error-prone , a l though genera l ly a cor rec t t rans la t ionis among the available choices, and

    the unchosen al ternat ive translat ions are st i l lavailable in the chart s tru ctu re afte r scoring bythe ta rge t l anguage model .

    3 Spee ch recog ni t ion for novel

    l anguagesC o n t e m p o r a r y s p e e c h r e c o g n it io n s y s t em s d e r iv et h e i r p o w e r f r o m c o r p u s - b a s e d s t a t i s t i c a l m o d e l -i n g , b o t h a t t h e a c o u s t i c a n d l a n g u a g e l e v e l s . S t a -t i st i c a l m o d e l i n g , o f c o u rs e , p r e s u p p o s e s t h a t s u f-f i c i e n t ly l a rg e c o r p o r a a r e a v a i l a b l e f o r t r a i n i n g .I t is i n th e n a t u r e o f t h e D I P L O M A T s y s t e m t h a ts u c h c o r p o r a , p a r t i c u l a r l y a c o u s t i c o n e s , a r e n o ti m m e d i a t e l y a v a i l a b l e fo r p r o ce s s in g . A s f o r t h eM T c o m p o n e n t , t h e e m p h a s i s i s o n r a p i d l y a c q u ir -i n g a n i n i t i a l c a p a b i l i t y i n a n o v e l l a n g u a g e , t h e nb e i n g a b le t o i n c r e m e n t a l ly i m p r o v e p e r f o r m a n c ea s m o r e d a t a a n d t i m e a re a v ai l a b le . We h a v e

    a d o p t e d f o r th e s p e e c h c o m p o n e n t a c o m b i n a t i o no f a p p r o a c h e s w h i c h , a l t h o u g h t h e y r e l y o n p a r t i c -i p a t i o n b y n a t i v e i n f o r m a n t s , a l s o m a k e e x t e n s i v eu s e o f p r e - e x i s t i n g a c o u s t i c a n d t e x t r e s o u rc e s .

    B u i l d i n g a s p e e c h r e c o g n i t i o n s y s t e m f o r a t a r -g e t d o m a i n o r la n g u a g e r e q u i r e s m o d e l s a t t h r e el e v e ls ( a s s u m i n g t h a t a b a s i c p r o c e s s i n g i n f r a s -t r u c t u r e f o r t r a i n i n g a n d d e c o d i n g i s a l r e a d y i np l a c e ) : a c o u s t i c , le x i c a l a n d l a n g u a g e .

    We h a v e e x p l o r e d t w o s t r a t e g i e s f o r a c o u s t i cm o d e l i n g . A s s i m i l a t i o nm a k e s u s e o f e x is t i n ga c o u s t i c m o d e l s f r o m a l a n g u a g e t h a t h a s a la rg ep h o n e t i c o v e r l a p w i t h t h e t a rg e t l a n g u a g e . T h i s

    a l lo w s u s t o r a p i d l y p u t a r e c o g n i t i o n c a p a b i l i t yi n p l a ce a n d w a s t h e s t r a t e g y u s e d f o r o u r S e r b o -C r o a t i a n ~ E n g l i s h s y s t e m . We w e r e a b l e t oa c h i e v e g o o d r e c o g n i t i o n p e r f o r m a n c e f o r v o c a b u -l a r ie s o f u p t o 7 3 3 w o r d s u s i n g t h i s t e c h n i q u e . O fc o u r s e , s u c h o v e r l a p s c a n n o t b e r e l i e d u p o n a n di n a n y c a s e w i l l n o t p r o d u c e r e c o g n i t i o n p e r f o r -m a n c e t h a t a p p r o a c h e s t h a t p o s s ib l e w i t h a p p r o -p r i a t e t r a i n i n g . N e v e r t h e l e s s i t d o e s s u g g es t t h a tu s e f ul r e c o g n i t i o n p e r f o r m a n c e f o r a l a rg e s e t o fl a n g u a g e s c a n b e a c h i e v e d g i v e n a c a r e f u ll y c h o s e ns e t o f c o r e la n g u a g e s t h a t c a n s e r v e a s a s o u rc e o f

    62

  • 8/10/2019 SLT 1997 FrederkiTranslation Memory Engines: A Look under the Hood and Road Testng

    3/6

    S o u r c e T a r g e tL a n g u a g e L a n g u a g e

    0 #

    _ , Morph olo gic al _ .d |A n a l y z e r

    T r a n s f e r B a s e d M T

    U s e r I n t e r f a c e

    E x a m p l e B a s e d M TS t a t i s t i c a lM o d e l l e r

    K n o w l e d g e B a s e d

    L............................__I ,' " '~ Ex pa nm on s lo t f . . . .

    F ig u re 1: S t r u c t u r e o f M E M Ta r c h i t e c t u r e

    a c o u s t i c m o d e l s f o r a c l u s t e r o f p h o n e t i c a l l y s i m i -l a r l a n g u a g e s .

    T h e s e l e c t i v e c o l l e c t i o na p p r o a c h p r e s u p p o s e sa p r e p a r a t i o n i n t e r v a l p r i o r t o d e p l o y m e n t a n dc a n b e a fo l l o w - o n t o a s y s t e m b a s e d o n a s s i m -i l a t io n . T h i s i s b e i n g d e v e l o p e d in t h e c o n t e x to f o u r H a i t i a n - C r e o l e a n d K o r e a n s y s te m s . T h eg o a l i s t o c a r r y o u t a l i m i t e d a c o u s t i c d a t a c o l l e c -t i o n e f f o r t u s i n g m a t e r i a l s t h a t h a v e b e e n e x p l i c -i t ly c o n s t r u c t e d t o y i e l d a ri c h p h o n e t i c s a m p l i n gf o r t h e t a rg e t l a n g u a g e . We d o t h i s b y f i rs t c o m -p u t i n g p h o n e t i c s t a t is t i c s f o r t h e l a n g u a g e u s i n ga v a i l a b l e t e x t m a t e r i a l s , t h e n d e s i g n i n g a r e c o r d -i n g s c r i p t t h a t e x h a u s t i v e l y s a m p l e s a ll d i p h o n e so b s e r v e d i n t h e a v a i l a b l e te x t s a m p l e . S u c h s c r ip t sr u n f r o m s ev e r a l h u n d r e d t o a r o u n d a t h o u s a n du t t e r a n c e s f o r t h e l a n g u a g e s w e h a v e e x a m i n e d .W h i l e t h e e f f e ct iv e n e s s o f t h is a p p r o a c h d e p e n d so n t h e q u a l i t y ( a n d q u a n t i t y ) o f t h e t e x t s a m p l et h a t c a n b e o b t a i n e d , w e b e l ie v e it p r o d u c e s a p -p r o p r i a t e d a t a f o r o u r m o d e l i n g p u r p o s e s .

    L e x i c a l m o d e l i n g i s b a s e d o n c r e a t i n g p r o n u n c i -a t i o n s f r o m o r t h o g r a p h y a n d i n v o lv e s a v a ri e t y o ft e c h n i q u e s f a m i l i a r f r o m s p e e c h s y n t h e s i s , i n c l u d -i n g l e t t e r - t o - s o u n d r u l es , p h o n o l o g i c a l ru l e s a n de x c e p t i o n l is t s. T h e g o a l o f o u r l e x i c a l m o d e l i n ga p p r o a c h is t o c r e a t e a n a c c e p t a b l e - q u a l i t y p ro -n o u n c i n g d i c t i o n a r y t h a t c a n b e v a r i o u s l y u s e d

    f o r a c o u s t i c t r a i n i n g , d e c o d i n g a n d s y n t h e s i s . W ew o r k w i t h a n i n f o r m a n t t o m a p o u t t h e p r o n u n -c i a t io n s y s t e m f o r t h e t a rg e t l a n g u a g e a n d m a k eu s e o f s u p p o r t i n g p u b l i s h e d i n f o r m a t i o n ( t h o u g hw e h a v e f o u n d s u c h t o b e m i s l e a d in g o n o c c a s i o n ) .S y s t e m v o c a b u l a r y i s d e r i v e d f r o m t h e t e x t m a t e -r i a l s a s s e m b l e d f o r a c o u s t i c m o d e l i n g , a s w e l l a ss c e n a ri o s f r o m t h e t a rg e t d o m a i n ( f o r e x a m p l e ,i n t e r v i e w s f o c u s s e d o n m i n e f i e l d m a p p i n g o r i n -t e l l ig e n c e s c r e e n i n g ) .

    F i n a l l y, d u e t o t h e g o a l s o f o u r p r o j e c t , l a n -g u a g e m o d e l i n g i s n e c e s s a r i l y b a s e d o n s m a l l c o r-p o r a . We m a k e u s e o f m a t e r i a l s d e r i v e d f r o m d o -m a i n s c e n a r i o s a n d f r o m g e n e r a l s o u r c e s s u c h a sn e w s p a p e r s ( s c a n n e d a n d O C R e d ) , t e x t i n t h e t a r -g e t l a n g u a g e a v a i l a b le o n t h e I n t e r n e t a n d t r a n s-l a t io n s o f s e le c t d o c u m e n t s . D u e t o th e s m a l la m o u n t s o f r e a d i ly a v a i la b l e d a t a ( o n t h e o r d e r o f5 0 k w o r d s f o r t h e l a n g u a g e s w e h a v e w o r k e d w i t h ) ,s t a n d a r d l a n g u a g e m o d e l i n g t o o l s a re d i f f ic u l t t ou s e , a s t h e y p r e s u p p o s e t h e a v a i l a b i l i t y o f c o r-p o r a t h a t a r e s e v e ra l o r d e r s o f m a g n i t u d e l a rg e r.N e v e r t h e l e s s w e h a v e b e e n s u c c e s s f u l i n c r e a t i n gs t a n d a r d b a c k o f f t r i g r a m m o d e l s f r o m v e r y s m a l lc o r p o r a . O u r t e c h n i q u e i n v o lv e s t h e u s e o f h ig hd i s c o u n t s a n d a p p e a r s t o p r o v i d e u s ef u l c o n s t r a i n tw i t h o u t c o r r e s p o n d i n g f r a g i li t y in t h e f a c e o f n o v e lm a t e r i a l .

    63

  • 8/10/2019 SLT 1997 FrederkiTranslation Memory Engines: A Look under the Hood and Road Testng

    4/6

    I n c o m b i n a t i o n , t h e s e t e c h n i q u e s a l l ow u s t oc r e a t e w o r k i n g r e c o g n i t i o n s y s t e m s i n v e r y s h o r tp e r i o d s o f t i m e a n d p r o v i d e a p a t h f o r e v o l u t i o n -a r y i m p r o v e m e n t o f r e c o g n it i on c a p a b i li t y . T h e yc l e a r l y a re n o t o f th e q u a l i t y t h a t w o u l d b ee x p e c t e d i f c o n v e n t i o n a l p r o c e d u r e s w e r e u s ed ,b u t n e v e r t h e l e s s a r e s u f f i c i e n t f o r p r o v i d i n g c r o s s -

    l a n g u a g e c o m m u n i c a t i o n c a p a b i l i t y i n l i m i t e d -d o m a i n s p e e c h t r a n s l a t i o n .

    4 U s e r I n t e rf a c e D e s i g n

    A s i n d i c a t e d a b o v e , o u r a p p r o a c h t o c o p i n g w i t he r r o r - p r o n e s p e e c h t r a n s l a t i o n i s t o a l l o w u s e r c o r-r e c t i o n w h e r e v e r f e a s i b l e . W h i l e w e w o u l d l i k e a sm u c h u s e r i n t e r a c t i o n a s p o s s i b le , i t i s a l s o im -p o r t a n t n o t t o o v e r w h e l m t h e u s e r w i t h e i t h e ri n f o r m a t i o n o r d e c i s io n s . T h i s r e q u i r e s a c a r e f u lb a l a n c e , w h i c h w e a re t r y i n g t o a c h i e v e t h r o u g he a r l y u s e r t e s t in g . We h a v e c a r r i e d o u t i n i t i a l t e s t-i n g u s i n g l oc a l n a i v e s u b j e c t s ( e . g. , d r a m a m a j o r sa n d c o n s t r u c t i o n w o r k e r s ) , a n d i n t e n d t o t e s t w i t ha c t u a l e n d u s e r s o n c e s p e c i f i c o n e s a r e i d e n t i f i e d .

    T h e p r i m a r y p o t e n ti a l u se fo r D I P L O M ATi d e n t i f i e d s o f a r i s t o a l l o w E n g l i s h - s p e a k i n g s o l -d i e r s o n p e a c e - k e e p i n g m i s s i o n s to i n t e r v i e w l o c a lr e s i d e n ts . W h i l e o n e c o u l d c o n c e i v a b l y t r a i n t h ei n t e r v i e w e r t o u s e a r e s t r i c t e d v o c a b u l a r y, t h e i n -t e r v i e w e e ' s r e s p o n s e s a r e m u c h m o r e d i f f i c u l t t oc o n t r o l o r p r e d i c t . A n i n it i a l s y s t e m h a s b e e nd e v e l o p e d t o r u n o n a p a i r o f l a p t o p c o m p u t -e r s , w i t h e a c h s p e a k e r u s i n g a g r a p h i c a l u s e r i n -t e r f a c e ( G U I ) o n t h e l a p t o p ' s s c r e e n ( s e e F i g u r e2 ). F e e d b a ck f r o m i n i ti a l d e m o n s t r a t i o n s m a d e i t

    c l e a r t h a t , w h i le w e c o u ld e x p e c t t h e i n t e r v i e w e rt o h a v e r o u g h l y e i g h t h o u r s o f t r a i n i n g , w e n e e d e dt o d e s i g n th e s y s t e m t o w o r k w i t h a t o t a l l y n a i v ei n t e r v ie w e e , w h o h a d n e v e r us e d a c o m p u t e r b e -f o r e. We r e s p o n d e d t o th i s r e q u i r e m e n t b y d e -v e l o p i n g a n a s y m m e t r i c i n t e rf a c e , w h e r e a n y n e c -e s s a r y c o m p l e x o p e r a t i o n s w e r e m o v e d t o t h e i n -t e r v i e w e r ' s s i de . T h e i n t e r v i e w e e ' s G U I is n o we x t r e m e l y s im p l e , a n d a t o u c h s c r e e n h a s b e e na d d e d , s o t h a t t h e i n t e r v i e w e e i s n o t r e q u i r e d t ot y p e o r u s e t h e p o i n t e r. I n a d d i t i o n , t h e i n te r -v i e w e r ' s G U I c o n t r o l s t h e s t a t e o f t h e i n t e r v i e w e e ' sG U I . T h e s p e e c h r e c o g n it i o n s y s t e m c o n t i n uo u s l yl i st e n s , t h u s t h e p a r t i c i p a n t s d o n o t n e e d t o p h y s -

    i c a l ly i n d i c a t e t h e i r i n t e n t i o n o f s p e a k i n g .A t y p i c a l e x c h a n g e c o n s i s t s o f r e c o g n i z i n g

    t h e i n t e r v i e w e r ' s s p o k e n u t t e r a n c e , t r a n s l a t i n gi t t o t h e t a r g e t l a n g u a g e , b a c k t r a n s l a t i n g i t t oE n g l i s h 2 , t h e n d i s p l a y i n g a n d s y n t h e s i z i n g t h e( p o s s i b l y c o r r e c t e d ) tr a n s l a t i o n . T h e i n t e r v ie -w e e ' s r e s p o n s e i s r e c o g n i z e d , t r a n s l a t e d t o E n -

    2We rea l i ze tha t back t rans la t ion i s a l so an e r ro r-p rone p rocess , bu t i t a t l eas t p rov ides some ev idenceas to whe ther the t r ans la t ion was cor rec t to someonewho does no t speak the t a rge t l anguage a t a l l .

    g l is h , a n d b a c k t r a n s l a t e d . T h e ( p o s s i b l y c o r -r e c t e d ) b a c k t r a n s l a t i o n i s t h e n s h o w n t o t h e i n t e r -v i e w e e f o r c o n f i r m a t i o n . T h e i n t e r v ie w e r r e c e iv e sa g r a p h i c i n d i c a t i o n o f w h e t h e r t h e b a c k t r a n s l a -t i o n w a s a c c e p t e d o r n o t . ( T h e a c t u a l c o m m u n i -c a t i o n p r o c e s s i s q u i t e f l e x i b l e , b u t t h i s i s a n o r m a ls c e n a r i o . )

    I n o r d e r t o a c h i e v e s u c h c o m m u n i c a t i o n , t h eu s er s c u rr e n t l y c a n i n t e r a c t w i th D I P L O M AT i nt h e f o l l o w i n g w a y s :

    S p e e c h d i s p l a y e d a s t e x t : A f t er a n y s p ee c hrecogni t ion s tep , the bes t overa l l hypo thes i s i sd i sp layed as t ex t on the sc reen . The use r canh igh l igh t an incor rec t por t ion us ing the touch-sc reen , and respeak o r type i t .

    C o n f i r m a t i o n r e q u e s t s : A f te r a n y s pe e chrecogni t ion o r mach ine t rans la t ion s tep , the use ri s o ffe red an accep t / re jec t bu t ton to ind ica tewhe the r th i s i s "w ha t they sa id" . For MT , back-t rans la t ions p rov ide the use r wi th an ab i l i ty tojudge w he ther they were in te rp re te d cor rec t ly.

    I n t e r a c t i v e c h a r t e d i t i n g : A s m e n ti on e da b o v e , t h e M E M T t e c h n o l o g y p r o d u c e s a s o u t -pu t a char t s t ruc tu re , s imi la r to the word hy-po thes i s l a t t i ces in speech sys tems . Af te r anyM T s t e p , t h e i n t e r v i e w e r i s a b l e t o e d i t t h ebes t overa l l hypo thes i s fo r e i the r the fo rward o rb a c k w a r d t r a n s l a t i o n u s i n g a p o p u p - m e n u - b a s e ded i to r, a s in our ea r l i e r Pang loss t ex t MT sys tem(Frederk ing e t a l . , 1993). The edi tor a l lows thein te rv iewer to eas i ly v iew and se lec t a l t e rna t ivet rans la t ions fo r any segm ent o f the t r ans la t ion .Edi t ing the fo rward t rans la t ion , causes an au to -ma t ic rework ing o f the back t rans la t ion . Ed i t ingthe bac k t rans la t ion a l lows the in te rv iewer to rec -ogn ize cor rec t fo rward t rans la t ions desp i te e r ro rs

    in the back t rans la t ion ; i f the back t rans la t ion canbe ed i t ed in to cor rec tness , the fo rward t rans la -t ion was p robab ly cor rec t .

    S i nc e a m a j o r g o a l o f D I P L O M AT i s r a p i d -d e p l o y m e n t t o n e w l a n g u a g e s , t h e G U I u s e s t h eU N I C O D E m u l t i l i n g u a l c h a r a c t e r e n c o d i n g s t a n -d a r d . T h i s w i l l n o t a l w a y s s u f f ic e , h o w e v e r ; a m a -j o r c h a l l e n g e f o r h a n d l i n g H a i t i a n - C r e o l e i s t h a t5 5 % o f t h e H a i t i a n p o p u l a t i o n i s i l l it e r a t e . Wew i ll h a v e t o d e v e l o p a n a l l - s p e e c h v e r s i o n o f th ei n t e r v ie w e e - s i d e i n t e r f a c e . A s w e h a v e d o n e w i t hp r e v i o u s i n t e r f a c e d e s i g ns , w e w i l l c a r r y o u t u s e rt e s t s e a r l y i n i t s d e v e l o p m e n t t o a s c e r t a i n w h e t h e ro u r i n t u i t i o n s o n t h e u s a b i l i t y o f t h i s v e r s i o n a r ec o r r e c t .

    5 C o n c l u s i o n

    We h a v e p r e se n t e d h e r e t h e D I P L O M AT s p e e c ht r a n s l a t i o n s y s t e m , w i t h p a r t i c u l a r e m p h a s i s o nt h e u s e r i n t e r a c ti o n m e c h a n i s m s e m p l o y e d t o c op ew i t h e r r o r - p r o n e sp e e c h a n d M T p r o c e s se s . Wee x p e c t t h a t , a f t e r a d d i t i o n a l t u n i n g b a s e d o n f u r -t h e r i n f o r m a l u se r s t u d i e s , a n i n t e r v i e w e r w i t he i g h t h o u r s o f t r a i n i n g s h o u l d b e a b l e t o u s e th e

    64

  • 8/10/2019 SLT 1997 FrederkiTranslation Memory Engines: A Look under the Hood and Road Testng

    5/6

    F ig ur e 2: S c r e e n S h o t o f U s e r I n t e r f a c e s : I n t e r v i e w e r ( l e ft ) a n d I n t e r v i e w e e ( r i g h t )

    DIP LO M AT sys t em to succes s ful ly in t e rv i ew sub -jec ts wi th no t ra in ing or previous computer expe-r ience . We hope to have a c tual user t r ia ls of e i therthe Se rbo -Croa t i an o r t he Ha i t i an -Creo le sys t emin the near fu ture , po ss ib ly th is summ er.

    R e f e r e n c e s

    Ra lf Brown. 1996. Exa mp le-Bas ed MachineTrans la t ion in the Pangloss System . InPro-ceedings of the 16th Interna t iona l Conferenceon Computa t iona l L ingu i s t i c s( C O L I N G - 9 6 ) .

    Ra lf Brown and Rob er t Frederking. 1995. App ly-ing Sta t i s t ica l Engl ish Language Model ing toSymb ol i c Mach ine Trans l a ti on . InProceedingsof the Sixth Inte rna t ion a l Conference on The-oret ical and Methodological Issues in MachineTranslation (TMI-95) , pages 221-239 .

    Da vid Farwell and Yorick W ilks. 1991. Ultra: AMult i - l ingual Mac hine Tran s la tor. InProceed-ings o f Mach ine Trans la t ion Sum m i t I l I ,Wash-ington, DC, July.

    Ro ber t Frederking. 1994. Sta t i s t ica l Lan-guage Mode l s fo r Sym bol i c MT . P re sen ted a tthe Language Eng inee r ing on the In fo rmat ionHighway Workshop,Santor in i , Greece , Septem-ber. Refereed.

    Rober t Frederking, D. Grannes , P. Cousseau, andS. Nirenburg . 1993. A n MA T Tool and I t s Ef-fectiveness. InProceedings o f the DA RPA Hu-man Language Technology Workshop,Prince-ton, NJ .

    Ro ber t Frederking and Ralf Brown . 1996. ThePang los s -L i t e Mach ine Trans l a t i on Sys t em. InProceedings of the Conference of the Associa -t ion fo r Mach ine Trans la t ion in the Am er ica s( A M TA ) .

    Ro ber t Frederking and Sergei Nirenburg . 1994.Three H eads a re Be t t e r than O ne . InProceed-ings of the four th Conference on Appl ied Natu -ral Langua ge Processing(ANLP-94) , S tu t tga r t ,G e r m a n y.

    Xuedong Huang, Fi leno Al leva , Hsiao-Wuen Hon,Mei-Yuh Hwang, Ronald Rosenfe ld . 1992. TheSPH INX-I I Speech Recogn it i on Sys t em: AnOverview. Carnegie Mel lon Univers i ty Com -pu te r Sc i ence Techn ica l Repor t CMU-CS-92-112.

    M art in Kay. 1967. Ex per ime nts wi th a powerfulparser. In Proceedings o f the 2nd Intern at iona lCOLING, Augus t .

    Kevin Lenzo. 1997. Personal Comm unicat ion .R. R. M acD onald . 1963. Gene ra l repor t 1952-

    1963 (Georgetown Univers i ty Occas ional Pa-pers in Machine Trans la t ion , no . 30) , Washing-ton, DC.

    A. K. Melby. 1983. Co mp uter-ass is te d t rans la t ionsys tems: the s tandard des ign and a mul t i - leveldesign. Conference on Appl ied Natural Lan-guage Processing,San ta Mon ica , Feb rua ry.

    Te ruko Mi tam ura , E r ic Nybe rg , J a ime Carbone l l .1991. In ter l ingua T rans la t ion System for Mul t i-L ingual Docu me n t P roduc t ion . InProceedingsof Mach ine Trans la t ion Summ i t I I I,Washing-

    ton, DC, July.M. Nagao. 1984. A f rame work of a mechan i-

    ca l t rans la t io n b etween Japanese and Engl ishby analog y pr incip le . In : A. El i thorn andR. Baner j i (eds . ) Art i f i c i a l and Human In te l -l igence.NATO Pub l i ca t ions .

    Sergei Nirenburg . 1995. The Pangloss Ma rkII I Ma chine Trans la t ion System . Join t Tech-n ica l Repor t , Comput ing Resea rch Labora to ry(New Mexico Sta te Univers i ty) , Center for Ma-chine Trans la t ion (Carnegie Mel lon Univers i ty) ,

    65

  • 8/10/2019 SLT 1997 FrederkiTranslation Memory Engines: A Look under the Hood and Road Testng

    6/6

    Information Sciences Institute (University ofSouthern (~alifornia). Issued as CMU technicalreport CMU-CMT-95-145.

    Mosur Ravishankar. 1996. Eff ic ien t Algor i thmsfor Speech Recognition.Ph.D. Thesis. CarnegieMellon University.

    Alex Rudnieky. 1995. Language modeling withlimited domain data. In Proceedings of theA RPA Workshop on Spoken Language Technol-ogy. San Mateo: Morgan Kaufmann, 66-69.

    S. Sato and M. Nagao. 1990. Towards memorybased translation. In Proceedings of COLING-90, Helsinki, Finland.

    Terry Winograd. 1983. Language as a Cogni t iveProcess. Volume 1: Syntax.Addison-Wesley.

    6 6