Figure captions - Larson Tech

Figure captions:

Figure 1. Typical information flow in a multimodal architecture

Figure 2. Facilitated multimodal architecture

Figure 3. The three-tiered bottom-up MTC architecture, comprised of multiple members, multiple teams, and a decision-making committee

Figure 4. Functionality, architectural features, and general classification of different multimodal speech and gesture systems.

Figure 5. Quickset system running on a wireless handheld PC, showing user-created entities on the map interface

Figure 6. The OGI Quickset system’s facilitated multi-agent architecture

Figure 7. Architectural flow of signal and language processing in IBM’s HCWP system

Figure 8. Boeing 777 aircraft’s main equipment center

Figure 9. Boeings natural language understanding technology

Figure 10. Boeings integrated speech and gesture system architecture

Figure 11. The NCR Field Medic Associates (FMA) flexible hardware components (top); FMA inserted into medics vest while speaking during field use(bottom)

Figure 12. The NCR Field Medic Coordinators (FMC) visual interface

Figure 13. The BBN Portable Voice Assistant (PVA) interface illustrating a diagram from the repair manual, with the systems display and spokenlanguage processing status confirmed at the top

Figure 14. Data flow through the PVA applets three processing threads

Figure 15: BBNs networked architecture for the PVA, with speech recognition processing split up such that only feature extraction takes place on thehandheld

Figure 1

Figure 2

Committee

Team 1 Team 2

Team 3 Team 4

Memberrecognizer

Memberrecognizer

Memberrecognizer

Memberrecognizer

Decision

Acceleration

Stroke length

Pressure

Speech rate

Signal energy

Imagecentroid

Linecrossings

Speech pitch accentMultiple Input Features

Figure 3

0XOWLPRGDO�6\VWHP&KDUDFWHULVWLFV� 4XLFN6HW +XPDQ�&HQWULF

:RUG�3URFHVVRU

95�$LUFUDIW0DLQWHQDQFH

7UDLQLQJ

)LHOG�0HGLF,QIRUPDWLRQ

3RUWDEOH�9RLFH$VVLVWDQW

5HFRJQLWLRQ�RI�VLPXOWDQHRXV�RUDOWHUQDWLYH�LQGLYLGXDO�PRGHV

Simultaneous &individual modes



Alternativeindividual modes1


7\SH��VL]H�RIJHVWXUH�YRFDEXODU\

Pen input,Multiple gestures,Large vocabulary

Pen input,Deictic selection

3D manual input,Multiple gestures,Small vocabulary

Pen input,Deictic selection

Pen inputDeictic selection2

6L]H�RI�VSHHFK�YRFDEXODU\��W\SH�RI�OLQJXLVWLF�SURFHVVLQJ

Moderate vocabulary,Grammar-based

Large vocabulary,Statistical language

processing

Small vocabulary,Grammar-based

Moderate vocabulary,Grammar-based

Small vocabulary,Grammar-based

7\SH�RI�VLJQDO�IXVLRQ

Late semantic fusion,Unification,

Hybrid symbolic/statisticalMTC framework

Late semantic fusion,Frame-based

Late semantic fusion,Frame-based

No mode fusionLate semantic fusion,

Frame-based

7\SH�RI�SODWIRUP�DSSOLFDWLRQV

Wireless handheld,Varied map & VR

applications4

Desktop computer,Word processing

Virtual reality system,Aircraft maintenance

training

Wireless handheld,Medical fieldemergencies

Wireless handheld,Catalogue ordering

(YDOXDWLRQ�VWDWXVProactive user-centered

design & iterative systemevaluations

Proactive user-centereddesign

Planned for futureProactive user-centered

design & iterativesystem evaluations

Planned for future

Figure 4

1 The FMA component recognizes speech only, and the FMC component recognizes gestural selections or speech. The FMC also can transmit digital speech andink data, and can read data from smart cards and physiological monitors.2 The PVA also performs handwriting recognition.3 A small speech vocabulary is up to 200 words, moderate 300-1,000 words, and large in excess of 1,000 words. For pen-based gestures, deictic selection is anindividual gesture, a small vocabulary is 2-20 gestures, moderate 20-100, and large in excess of 100 gestures.4 QuickSet’s map applications include military simulation, medical informatics, real estate selection, and so forth.

Figure 5

Gesture

Java-enabledWeb pages

ALP

Facilitatorrouting, triggeringdispatching,

ModSAFSimulator

Speech/TTS

NaturalLanguage

QuickSetInterface

MultimodalIntegration

SimulationAgent

ApplicationBridge

DatabasesOtherFacilitators

CORBAbridge

Other userinterfaces(e.g.,Wizard)

ExInit

VR Interfaces:CommandVu

Dragon2VGIS

COMobjects

Interagent Communication Language

ICL — Horn Clauses

Figure 6

Figure 7

Figure 8

Figure 9

1DWXUDO�/DQJXDJH�8QGHUVWDQGLQJ�7HFKQRORJ\

6HQWHQFH�/HYHO�$QDO\VLV 'LVFRXUVH�/HYHO�$QDO\VLV

7RNHQL]HU 6\QWDFWLF3DUVHU

6HPDQWLF,QWHUSUHWHU

6\QWDFWLF/H[LFRQ�DQG3DWWHUQV

6\QWDFWLF*UDPPDU

6WUXFWXUH,QLWLDOL]HU

(QWLW\,QWHUSUHWHU

$FWLRQ�DQG(YHQW,QWHUSUHWHU

'RPDLQ5HOHYDQW5HODWLRQV

,QWHU�VHQWHQWLDO5HODWLRQV$QDO\]HU

5HIHUHQFH5HVROYHU

6HPDQWLF�/H[LFRQV

6HPDQWLF�7\SH�+LHUDUFK\

Figure 10

��6SHHFK�*HVWXUH�,QWHJUDWLRQ�$UFKLWHFWXUH

:RUG�6WULQJV

'LYLVLRQ

�*HVWXUH��9L]��$FWRU��$FWRU

'DWD*ORYH

'LYLVLRQ2EMHFW�,'¶VDQG�RU/RFDWLRQV

1/8�6\VWHP

*HVWXUH�,QWHUSUHWDWLRQ

,QWHJUDWLRQ�6\VWHP &RPPDQG�*HQHUDWLRQ�6\VWHP

&200$1')UDPHV

*(6785()UDPHV

1DYLJDWLRQ�0DQLSXODWLRQ�6HOHFWLRQ&RPPDQGV

8QLILHG&RPPDQG�*HVWXUH7HPSODWHV

6SHHFK5HFRJQL]HU

*HVWXUH5HFRJQL]HU

Figure 11

Figure 12

Figure 13

Application Output(audio and visual)

Pen Speech

Integrationand

Response

From voicerecognition server

From stylus ormouse

Visualfeedback

Audiofeedback

Events

Figure 14

Wireless Network

HandheldClient

Server

HTML

SpeechFront-end

Speech parameters& Control

WebServer

RecognitionServer

Web Browserrunning Javaapplet

Microphone

Images/Audio Speech API

Grammar-Vocabulary

Java Classes Pen

Figure 15

Documents

Figure captions - Larson Tech