Upload
jerome-cook
View
230
Download
3
Tags:
Embed Size (px)
Citation preview
Tutorial
Developing and Deploying Multimodal Applications
James A. LarsonLarson Technical Services
jim @ larson-tech.com
SpeechTEK WestFebruary 23, 2007
James A. Larson Developing & Delivering Multimodal Applications 2
Developing and Deploying Multimodal Applications
What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal applications?
What standard platforms are available for multimodal applications?
James A. Larson Developing & Delivering Multimodal Applications 3
Capturing Input from the User
Acoustic
Tactile
Visual
Microphone
Keypad
Keyboard
Pen
Joystick
Scanner
Still camera
Video camera
Speech
Key
Ink
GUI
Photograph
Movie
Mouse
Medium Input Device Mode
James A. Larson Developing & Delivering Multimodal Applications 4
Capturing Input From the User Multimodal
Acoustic
Tactile
Visual
Microphone
Keypad
Keyboard
Pen
Joystick
Scanner
Still camera
RFID
Speech
Key
Ink
GUI
Photograph
Gaze trackingGesture reco
Mouse
Medium Input Device Mode
Electronic
Video camera
Biometric
GPS
Digital data
James A. Larson Developing & Delivering Multimodal Applications 5
Presenting Output to the User
Acoustic
Visual
Speaker
Display
Speech
Text
Photograph
Movie
Medium Output Device Mode
Tactile Joystick Pressure
James A. Larson Developing & Delivering Multimodal Applications 6
Presenting Output to the User
Acoustic
Visual
Speaker
Display
Speech
Text
Photograph
Movie
Medium Output Device Mode
Tactile Joystick Pressure
Multimedia
James A. Larson Developing & Delivering Multimodal Applications 7
Multimodal and Multimedia Application Benefits
Provide a natural user interface by using multiple channels for user interactions
Simplify interaction with small devices with limited keyboard and display, especially on portable devices
Leverage advantages of different modes in different contexts
Decrease error rates and time required to perform tasks
Increase accessibility of applications for special users
Enable new kinds of applications
James A. Larson Developing & Delivering Multimodal Applications 8
Exercise 1
What new multimodal applications would be useful for your work?
What new multimodal applications would be entertaining to you, your family, or friends?
James A. Larson Developing & Delivering Multimodal Applications 9
Voice as a “Third Hand”
Game Commander 3
• http://www.gamecommander.com/
James A. Larson Developing & Delivering Multimodal Applications 10
Voice-Enabled Games
Scansoft’s VoCon Games Speech SDK
• http://www.scansoft.com/games/
• PlayStation® 2
• Nintendo® GameCube™
• http://www.omnipage.com/games/poweredby/
James A. Larson Developing & Delivering Multimodal Applications 11
Education
Tucker Maxon School of Oral Educationhttp://www.tmos.org/
James A. Larson Developing & Delivering Multimodal Applications 12
Education
Reading Tutor Projecthttp://cslr.colorado.edu/beginweb/reading/reading.html
James A. Larson Developing & Delivering Multimodal Applications 13
Multimodal Applications Developed by PSU and OHSU Students
Hands-busy
Troubleshooting a car’s motor
Repairing a leaky faucet
Tune musical instruments
Construction
Complex origami artifactProject book for children
Cooking—Talking recipe book
Entertainment
Child’s fairy tale bookAudio-controlled juke boxGames (Battleship, Go)
James A. Larson Developing & Delivering Multimodal Applications 14
Multimodal Applications Developed by PSU and OHSU Students (continued)
Data collection
Buy a carCollect health dataBuy movie ticketsOrder meals from a restaurantConduct banking businessLocate a businessOrder a computerChoose homeless pets from an animal shelter
AuthoringPhoto album tour
Education
Flash cards—Addition tables
Download Opera and the speech plug-inGo to www.larson-tech.com/mm-Projects/Demos.htm
James A. Larson Developing & Delivering Multimodal Applications 15
New Application Classes
Active listening
Verbal VCR controls: start, stop, fast forward, rewind, etc.
Virtual assistants
Listen for requests and immediately perform them
- Violin tuner- TV Controller- Environmental controller- Family-activity coordinator
Synthetic experiences
Synthetic interviewsSpeech-enabled gamesEducation and training
Authoring content
James A. Larson Developing & Delivering Multimodal Applications 16
Two General Uses of Multiple Modes of Input
Redundancy—One mode acts as backup for another mode
In noisy environments, use keypad instead of speech input.
In cold environments, use speech instead of keypad.
Complementary—One mode supplements another mode
Voice as a third hand
“Move that (point) to there (point)” (late fusion)
Lip reading = video + speech (early fusion)
James A. Larson Developing & Delivering Multimodal Applications 17
Potential Problems with Multimodal Applications
Voice may make an application “noisy.”
• Privacy and security concerns
• Noise pollution
Sometimes speech and handwriting recognition systems fail.
False expectations of users wanting to use natural language.
James A. Larson Developing & Delivering Multimodal Applications 18
Potential Problems with Multimodal Applications
Voice may make an application “noisy.”
• Privacy and security concerns
• Noise pollution
Sometimes speech and handwriting recognition systems fail.
False expectations of users wanting to use natural language.
Full natural language processing requires:• Knowledge of outside world• History of the user-computer interaction• Sophisticated understanding of language structure “Natural language-like” simulates natural language for a small domain, short history, and specialized language structures
James A. Larson Developing & Delivering Multimodal Applications 19
Potential Problems with Multimodal Applications
Voice may make an application “noisy.”
• Privacy and security concerns
• Noise pollution
Sometimes speech and handwriting recognition systems fail.
False expectations of users wanting to use natural language.
Full “natural language” processing requires:• Knowledge of outside world• History of the user-computer interaction• Sophisticated understanding of language structure “Natural language-like” simulates natural language for a small domain, short history, and specialized language structures.
Possible only on Star Trek
Incorrectly called “NLP”
James A. Larson Developing & Delivering Multimodal Applications 20
Adding a New Mode to an Application
Only if…
The new mode enables new features not previously possible.
The new modes dramatically improves the usability
Always….
Redesign the application to take advantage of the new mode.
Provide backup for the new mode.
Test, test, and test some more.
James A. Larson Developing & Delivering Multimodal Applications 21
Exercise 2
Where will multimodal applications be used?
A. At home
B. At work
C. “On the road”
D. Other?
James A. Larson Developing & Delivering Multimodal Applications 22
Developing and Deploying Multimodal Applications
What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal applications?
What standard platforms are available for multimodal applications?
James A. Larson Developing & Delivering Multimodal Applications 23
The Playbill—Who’s Who on the Team
Users—Their lives will be improved by using the multimodal application
Interaction designer—Designs the dialog—when and how the user and system interchange requests and information
Multimodal programmer—Implements VUI
Voice talent—Records spoken prompts and messages
Grammar writer—Specifies words and phrases the user may speak in response to a prompt
TTS specialist—Specifies verbal and audio sounds and inflections
Quality assurance specialist—Performs tests to validate the application is both useful and usable
Customer—Pays the bills
Program manager—Organizes the work and makes sure it is completed according to schedule and under budget
James A. Larson Developing & Delivering Multimodal Applications 24
Development Process
Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage
Each stage involves users
Iterative refinement
James A. Larson Developing & Delivering Multimodal Applications 25
Development Process
Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage
Identify the Application• Conduct ethnography studies• Identify candidate applications• Conduct focus groups• Select the application
James A. Larson Developing & Delivering Multimodal Applications 26
James A. Larson Developing & Delivering Multimodal Applications 27
Exercise 3
What will be the “killer” consumer multimodal applications?
James A. Larson Developing & Delivering Multimodal Applications 28
Development Process
Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage
Specify the Application• Construct the conceptual model• Construct scenarios• Specify performance and preference requirements
James A. Larson Developing & Delivering Multimodal Applications 29
Specify Performance and Preference Requirements
Is the application useful? Is the application enjoyable?
Performance Preference
Measure what the users actually accomplished.
Validate that the users achieved success.
Measure users’ likes and dislikes.
Validate that the users enjoyed the application and will use it
again again.
James A. Larson Developing & Delivering Multimodal Applications 30
Performance Metrics
User Task Measure Typical Criteria
Speak a command Word error rate Less than 3%
The caller supplies values into a form
Enters valid values into each field of a form
< 5 seconds per value
Navigate a list The user successfully selects the specified option.
Greater than 95%
Purchase a product The user successfully completes the purchase option.
Greater than 93%
James A. Larson Developing & Delivering Multimodal Applications 31
Exercise 4
User Task Measure Typical Criteria
Specify performance metrics for the multimodal email application
James A. Larson Developing & Delivering Multimodal Applications 32
Preference Metrics
Question Typical Criteria
On a scale from 1 to 10, rate the help facility.
The average caller score is greater than 8.
On a scale from 1 to 10, rate the ease of use of this application.
The average caller score is greater than 8.
Would you recommend using this voice portal to a friend?
Over 80% of callers respond by saying “yes.”
What would you be willing to pay to each time you use this application?
Over 80% of callers indicate that they are willing to pay $1.00 or more per use.
James A. Larson Developing & Delivering Multimodal Applications 33
Exercise 5
Question Typical Criteria
Specify preference metrics for the multimodal email application
James A. Larson Developing & Delivering Multimodal Applications 34
Preference Metrics (Open-ended Questions)
What did you like the best about this voice-enabled application? (Do not change these features.)
What did you like the least about this voice-enabled application? (Consider changing these features.)
What new features would you like to have added? (Consider adding these features in this or a later release.)
What features do you think you will never use? (Consider deleting these features.)
Do you have any other comments and suggestions? (Pay attention to these responses. Callers frequently suggest very useful ideas.)
James A. Larson Developing & Delivering Multimodal Applications 35
Development Process
Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage
Develop the Application• Specify the persona• Specify the modes and modalities• Specify the dialog script
James A. Larson Developing & Delivering Multimodal Applications 36
UI Design Guidelines
Guidelines for Voice User Interfaces
• Bruce Balentine and David P. Morgan. How to Build a Speech Recognition Application, Second Edition. http://www.eiginc.com
Guidelines for Graphical User Interfaces
• Research-Based Web Design and Usability Guidelines. U.S. Department of Health and Human Services. http://www.usability.gov/pdfs/guidelines.html
Guidelines for Graphical User Interfaces
• Common Sense Guidelines for Developing Multimodal User Interfaces.W3C Working Group Note. 19 April 2006 http://www.w3.org/2002/mmi/Group/2006/Guidelines/
James A. Larson Developing & Delivering Multimodal Applications 37
Common-sense Suggestions1. Satisfy Real-World Constraints
Task-oriented Guidelines
1.1. Guideline: For each task, use the easiest mode available on the device.
Physical Guidelines
1.2. Guideline: If the user’s hands are busy, then use speech.
1.3. Guideline: If the user’s eyes are busy, then use speech.
1.4. Guideline: If the user may be walking, use speech for input.
Environmental Guidelines
1.5. Guideline: If the user may be in a noisy environment, then use a pen, keys or mouse.
1.6. Guideline: If the user’s manual dexterity may be impaired, then use speech.
James A. Larson Developing & Delivering Multimodal Applications 38
Exercise 6
What input mode(s) should be used for each of the following tasks?
A. Selecting objects
B. Entering text
C. Entering symbols
D. Enter sketches or illustrations
James A. Larson Developing & Delivering Multimodal Applications 39
Common-sense Suggestions2. Communicate Clearly, Concisely, and Consistently with Users
Consistency Guidelines
2.1. Phrase all prompts consistently.
2.2. Enable the user to speak keyword utterances rather than natural language sentences.
2.3. Switch presentation modes only when the information is not easily presented in the current mode.
2.4. Make commands consistent.
2.5. Make the focus consistent across modes.
Organizational Guidelines
2.6. Use audio to indicate the verbal structure.
2.7. Use pauses to divide information into natural “chunks.”
2.8. Use animation and sound to show transitions.
2.9. Use voice navigation to reduce the number of screens.
2.10. Synchronize multiple modalities appropriately.
2.11. Keep the user interface as simple as possible.
James A. Larson Developing & Delivering Multimodal Applications 40
Common-sense Suggestions3. Help Users Recover Quickly and Efficiently from Errors
Conversational Guidelines
3.1. Users tend to use the same mode that was used to prompt them.
3.2. If privacy is not a concern, use speech as output to provide commentary or help.
3.3. Use directed user interfaces, unless the user is always knowledgeable and experienced in the domain.
3.4 Always provide context-sensitive help for every field and command.
James A. Larson Developing & Delivering Multimodal Applications 41
Common-sense Suggestions3. Help Users Recover Quickly and Efficiently from Errors (Continued)
Reliability GuidelinesOperational status
3.5. The user always should be able to determine easily if the device is listening to the user.
3.6. For devices with batteries, users always should be able to determine easily how much longer the device will be operational.
3.8. Support at least two input modes so one input mode can be used when the other cannot.
Visual feedback
3.8. Present words recognized by the speech recognition system on the display, so the user can verify they are correct.
3.9. Display the n-best list to enable easy speech recognition error correction
3.10. Try to keep response times less than 5 seconds. Inform the user of longer response times.
James A. Larson Developing & Delivering Multimodal Applications 42
Common-sense Suggestions4. Make Users Comfortable
Listening mode
4.1. Speak after pressing a speak key. which automatically releases after the user finishes speaking.
System Status 4.2. Always present the current system status to the user.
Human-memory Constraints
4.3. Use the screen to ease stress on the user’s short-term memory.
James A. Larson Developing & Delivering Multimodal Applications 43
Common-sense Suggestions4. Make Users Comfortable (Continued)
Social Guidelines 4.4. If the user may need privacy, use a display rather than render speech.
4.5. If the user may need privacy, use a pen or keys.
4.6. If the device may be used during a business meeting, then use a pen or keys (with the keyboard sounds turned off).
Advertising Guidelines4.7. Use animation and sound to attract the user’s attention.
4.8. Use landmarks to help the know where he is.
James A. Larson Developing & Delivering Multimodal Applications 44
Common-sense Suggestions4. Make Users Comfortable (continued)
Ambience
4.9 Use audio and graphic design to set the mood and convey emotion in games and entertainment applications.
Accessibility
4.10 For each traditional output technique, provide an alternative output technique.
4.11. Enable users to adjust the output presentation.
James A. Larson Developing & Delivering Multimodal Applications 45
Books
Ramon Lopez-Cozar Delgado and Masahiro Araki. Spoken, Multilingual and Multimodal Dialog Systems—Development and Assessment. West Sussex, England: Wiley, 2005.
Julie A. Jacko and Andrew Sears (Editors) The Human-Computer Interaction Handbook—Fundamentals, Evolving technologies, and Emerging Applications. Mahwah, New Jersey: Lawrence Erlbaum Associates, 2003.
James A. Larson Developing & Delivering Multimodal Applications 46
Development Process
Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage
Test The Application• Component test• Usability test• Stress test• Field test
James A. Larson Developing & Delivering Multimodal Applications 47
Testing Resources
Jeffrey Rubin. Handbook of Usability Testing. New York: Wiley Technical Communication Library, 1994.
Peter and David Leppik. Gourmet Customer Service. Eden Prairie, MN: VocalLabs, 2005. [email protected]
James A. Larson Developing & Delivering Multimodal Applications 48
Development Process
Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage
Deploy and Monitor the Application• User Survey• Usage reports from log files• User feedback and comments
James A. Larson Developing & Delivering Multimodal Applications 49
Developing and Deploying Multimodal Applications
What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal applications?
What standard platforms are available for multimodal applications?
James A. Larson Developing & Delivering Multimodal Applications 50
W3C Multimodal Interaction Framework
Recognition Grammar
Semantic Interpretation
Extended Multimodal Annotation (EMMA)
Speech Synthesis
Interaction Managers
General description of speech application
components and how they relate
James A. Larson Developing & Delivering Multimodal Applications 51
InteractionManager
ApplicationFunctions
TelephonyProperties
W3C Multimodal Interaction Framework
Input
Output
James A. Larson Developing & Delivering Multimodal Applications 52
ASRSemantic
Interpretation
InformationIntegration
InteractionManager
TTSLanguage
Generation
ApplicationFunctions
User
Ink
Media Planning
AudioTelephonyFunctions
W3C Multimodal Interaction Framework
Display
James A. Larson Developing & Delivering Multimodal Applications 53
W3C Multimodal Interaction Framework
ASRSemantic
Interpretation
InformationIntegration
InteractionManager
TTSLanguage
Generation
ApplicationFunctions
User
Ink
Media Planning
AudioTelephonyFunctions
Display
SRGS: Describe what the user may say at each point in the dialog
James A. Larson Developing & Delivering Multimodal Applications 54
Speech Recognition Engines
Low-end High-end Other
Speaking mode Isolated (discrete) Continuous Keywords
Enrollment Speaker dependent
Speaker independent
Adaptive
Vocabulary size Small Large Switch vocabularies
Speaking style Read Spontaneous
Number of simultaneous callers
Single-threaded Multi-threaded
James A. Larson Developing & Delivering Multimodal Applications 55
Speech Recognition Engines
Low-end High-end Other
Speaking mode Isolated (discrete) Continuous Keywords
Enrollment Speaker dependent
Speaker independent
Adaptive
Vocabulary size Small Large Switch vocabularies
Speaking style Read Spontaneous
Number of simultaneous callers
Single-threadedMulti-threaded
James A. Larson Developing & Delivering Multimodal Applications 56
Grammars
Describe what the user may say or handwrite at a point in the dialog
Enable the recognition engine to work faster and more accurately
Two types of grammars:– Structured Grammar– Statistical Grammar (N-grams)
James A. Larson Developing & Delivering Multimodal Applications 57
Structured Grammars
Specifies words that a user may speak or write
Two representation formats
1. Backus-Naur format (ABNF) Production Rules
Single_digit ::= zero | one | two | … | nine Zero_thru_ten ::= Single_digit | ten
2. XML format Can be processed by XML validater
James A. Larson Developing & Delivering Multimodal Applications 58
Example XML Grammar
<grammar mode = "voice" type = "application/srgs+xml" root = "zero_to_ten“>
<rule id = "zero_to_ten"> <one-of> <ruleref uri = "#single_digit"/> <item> ten </item> </one-of></rule>
<rule id = "single_digit"> <one-of> <item> zero </item> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of> </rule></grammar>
James A. Larson Developing & Delivering Multimodal Applications 59
Exercise 7
Write a grammar that recognizes the digits zero through nineteen
(Hint: Modify the previous page)
James A. Larson Developing & Delivering Multimodal Applications 60
Reusing Existing Grammars
<grammar
type = "application/srgs+xml" root = "size " src = "http://www.example.com/size.grxml"/>
James A. Larson Developing & Delivering Multimodal Applications 61
Exercise 8
Write a grammar for positive responses to a yes/no question (i.e., “yes,” “sure,” “affirmative,” and so forth)
James A. Larson Developing & Delivering Multimodal Applications 62
When Is a Grammar Too Large?
WordCoverage
Response
James A. Larson Developing & Delivering Multimodal Applications 63
W3C Multimodal Interaction Framework
ASRSemantic
Interpretation
InformationIntegration
InteractionManager
TTSLanguage
Generation
ApplicationFunctions
User
Ink
Media Planning
AudioTelephonyFunctions
Display
SISR: A procedural JavaScript-like language for interpreting the text strings returned by the speech synthesis engine
James A. Larson Developing & Delivering Multimodal Applications 64
Semantic Interpretation
Semantic scripts employ ECMAScript
Advantages:– Translate aliases to vocabulary words– Perform calculations– Produces a rich structure rather than a text string
James A. Larson Developing & Delivering Multimodal Applications 65
Semantic Interpretation
Recognizer
ConversationManager
Large white t-shirt
Big white t-shirt
Grammar
James A. Larson Developing & Delivering Multimodal Applications 66
Semantic Interpretation
Recognizer
Grammar withSemantic
InterpretationScripts
SemanticInterpretation
Processor
ConversationManager
<rule id = "action"> <one-of> <item> small <tag> out.size = "small"; </tag> </item> <item> medium <tag> out.size = "medium"; </tag> </item>
<item> large <tag> out.size = "large"; </tag> </item> <item> big <tag> out.size = "large"; </tag> </item> </one-of> <one-of> <item> green <tag> out.color = "green"; </tag> </item> <item> blue <tag> out.color = "blue"; </tag> </item> <item> white <tag> out.color = "white"; </tag> </item> </one-of></rule>
Big white t-shirt
{ size: large color: white}
James A. Larson Developing & Delivering Multimodal Applications 67
Exercise 9 Modify this rule to return only “yes”
<grammar type = "application/srgs+xml" root = "yes" mode = "voice">
<rule id = "yes"> <one-of> <item> yes </item> <item> sure </item> <item> affirmative </item>
…
</one-of> </rule>
</grammar>
James A. Larson Developing & Delivering Multimodal Applications 68
W3C Multimodal Interaction Framework
ASRSemantic
Interpretation
InformationIntegration
InteractionManager
TTSLanguage
Generation
ApplicationFunctions
User
Ink
Media Planning
AudioTelephonyFunctions
Display
EMMA: A language for representing the semantic content from speech recognizers, handwriting recognizers, and other input devices
James A. Larson Developing & Delivering Multimodal Applications 69
EMMA
Extensible MultiModal Annotation markup language
Canonical structure semantic interpretations for a variety of inputs including:
• Speech
• Natural language text
• GUI
• Ink
James A. Larson Developing & Delivering Multimodal Applications 70
EMMA
Keyboard Interpretation
SpeechRecognition
Merging/Unification
Speech Keyboard
EMMA EMMA
EMMA
Grammar+ Semantic
InterpretationInstructions
InterpretationInstructions
Applications
James A. Larson Developing & Delivering Multimodal Applications 71
EMMA
Keyboard Interpretation
SpeechRecognition
Merging/Unification
Speech Keyboard
EMMA EMMA
EMMA
Grammar+ Semantic
InterpretationInstructions
InterpretationInstructions
Applications
<interpretation mode = "speech"> <travel> <to hook="ink"/> <from hook="ink"/> <day> Tuesday </day> </travel></interpretation>
James A. Larson Developing & Delivering Multimodal Applications 72
EMMA
Keyboard Interpretation
SpeechRecognition
Merging/Unification
Speech Keyboard
EMMA EMMA
EMMA
Grammar+ Semantic
InterpretationInstructions
InterpretationInstructions
Applications
<interpretation mode = "speech"> <travel> <to hook="ink"/> <from hook="ink"/> <day> Tuesday </day> </travel></interpretation>
<interpretation mode = "ink"> <travel> <to>Las Vegas </to> <from>Portland </from> </travel></interpretation>
James A. Larson Developing & Delivering Multimodal Applications 73
<interpretation mode = "speech"> <travel> <to hook="ink"/> <from hook="ink"/> <day> Tuesday </day> </travel></interpretation>
<interpretation mode = "ink"> <travel> <to>Las Vegas </to> <from>Portland </from> </travel></interpretation>
EMMA
Keyboard Interpretation
SpeechRecognition
Merging/Unification
Speech Keyboard
EMMA EMMA
EMMA
Grammar+ Semantic
InterpretationInstructions
InterpretationInstructions
Applications
<interpretation mode = "interp1"> <travel> <to> Las Vegas </to> <from> Portland </from> <day> Tuesday </day> </travel></interpretation>
James A. Larson Developing & Delivering Multimodal Applications 74
Exercise 10
<interpretation mode = "speech"> <moneyTransfer> <sourceAcct hook="ink"/> <targetAcct hook="ink"/> <amount> 300 </amount> </moneyTransfer></interpretation>
<interpretation mode = "ink"> <moneyTransfer> <sourceAcct> savings </sourceAcct> <targetAcct> checking</targetAcct> </moneyTransfer></interpretation>
Given the following two EMMA specifications, what is the unified EMMA specification?
<interpretation mode ="intp1"> <moneyTransfer> <sourceAcct> ______ </sourceAcct> <targetAcct> _______</targetAcct> <amount> ______ </amount> </moneyTransfer></interpretation>
Unified EMMA specification:
James A. Larson Developing & Delivering Multimodal Applications 75
W3C Multimodal Interaction Framework
ASRSemantic
Interpretation
InformationIntegration
InteractionManager
TTSLanguage
Generation
ApplicationFunctions
User
Ink
Media Planning
AudioTelephonyFunctions
Display
SSML: A language for rendering text as synthesized speech
James A. Larson Developing & Delivering Multimodal Applications 76
Speech Synthesis Markup Language
StructureAnalysis
TextNormali-
zation
Text-to-Phoneme
Conversion
Prosody Analysis
WaveformProduction
Markup support:emphasis, break, prosodyNon-markup behavior:automatically generate prosody through analysis of document structure andsentence syntax
Markup support:phoneme, sayasNon-markup behavior:look up in pronunciation dictionary
Markup support: sayas for dates, times, etc.Non-markup behavior: automatically identify and convert constructs
Markup support:paragraph, sentenceNon-markup behavior:infer structure byautomated text analysis
James A. Larson Developing & Delivering Multimodal Applications 77
Speech Synthesis Markup LanguageExamples
<phoneme alphabet="ipa" ph="wɪnɛfɛks"> WinFX </phoneme>is a great platform
<prosody pitch = "x-low"> Who’s been sleeping in my bed? </prosody> said papa bear. <prosody pitch = "medium"> Who’s been sleeping in my bed? </prosody> said momma bear. <prosody pitch = "x-high"> Who’s been sleeping in my bed? </prosody> said baby bear.
James A. Larson Developing & Delivering Multimodal Applications 78
Popular Strategy
Develop dialogs using SSML
Usability test dialogs
Extract prompts
Hire voice talent to record prompts
Replace <prompt> with <audio>
James A. Larson Developing & Delivering Multimodal Applications 79
W3C Multimodal Interaction Framework
ASRSemantic
Interpretation
InformationIntegration
InteractionManager
TTSLanguage
Generation
ApplicationFunctions
User
Ink
Media Planning
AudioTelephonyFunctions
Display
VXML: A language for controlling the exchange of information and commands between the user and the system
James A. Larson Developing & Delivering Multimodal Applications 80
Developing and Deploying Multimodal Applications
What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal applications?
What standard platforms are available for multimodal applications?
James A. Larson Developing & Delivering Multimodal Applications 81
Speech APIs and SDKs
• JSAPI—Java Speech Application Program Interface– http://java.sun.com/products/java-media/speech/– http://developer.mozilla.org/en/docs/JSAPI_Reference
• Nuance Mobil Speech Platform– http://www.nuance.com/speechplatform/components.asp
• VSAPI—Voice Signal API– http://www.voicesignal.com/news/articles/2006-06-21-SymbianOne.htm
• SALT– http://www.saltforum.org/
James A. Larson Developing & Delivering Multimodal Applications 82
Interaction Manager Approaches
Interaction Manager(XHTML)
VoiceXML 2.0Modules
Interaction Manager
(C#)
SAPI 5.3
X+VObject-oriented
Interaction Manager(SCXML)
XHTML
VoiceXML 3.0
InkML
W3C
James A. Larson Developing & Delivering Multimodal Applications 83
Interaction Manager Approaches
X+V
Interaction Manager(SCXML)
XHTML
VoiceXML 3.0
InkML
W3C
Interaction Manager(XHTML)
VoiceXML 2.0Modules
Interaction Manager
(C#)
SAPI 5.3
Object-oriented
James A. Larson Developing & Delivering Multimodal Applications 84
SAPI 5.3 & Windows Vista™Speech Synthesis
W3C Speech Synthesis Markup Language 1.0
<speak> <phoneme alphabet="ipa" ph="wɪnɛfɛks">
WinFX </phoneme>
is a great platform</speak>
Microsoft proprietary PromptBuilder
myPrompt.AppendTextWithPronunciation ("WinFX", "wɪnɛfɛks");
myPrompt.AppendText("is a great platform.");
Interaction Manager
(C#)
SAPI 5.3
Object-oriented
James A. Larson Developing & Delivering Multimodal Applications 85
SAPI 5.3 & Windows Vista™Speech Recognition
W3C Speech Recognition Grammar Specification 1.0
<grammar type="application/srgs+xml" root= "city" mode="voice"><rule id = "city">
<one-of><item> New York City </item><item> New York </item><item> Boston </item>
</one-of></rule>
</grammar>
Microsoft proprietary Grammar Builder
Choices cityChoices = new Choices();cityChoices.AddPhrase ("New York City");cityChoices.AddPhrase ("New York");cityChoices.AddPhrase ("Boston");Grammar pizzaGrammar
= new Grammar (new GrammarBuilder(pizzaChoices));
James A. Larson Developing & Delivering Multimodal Applications 86
SAPI 5.3 & Windows Vista™Semantic Interpretation
Augment SRGS grammar with Jscript® for semantic interpretation
<grammar type="application/srgs+xml" root= "city" mode="voice"><rule id = "city">
<one-of><item> New York City <tag> city="JFK" </tag></item><item> New York <tag> city = "JFK" </tag> </item><item> Portland <tag> city = "PDX" </tag></item>
</one-of></rule>
</grammar>
User-Specified “Shortcuts” recognizer replaces “shortcut word”by expanded string
User says: my address
System: 1033 Smith Street, Apt. 7C, Bloggsville 00000
James A. Larson Developing & Delivering Multimodal Applications 87
SAPI 5.3 & Windows Vista™Dialog
1. Introduce the System Speech.Recognition namespace
2. Instantiate a SpeechRecognizer object
3. Build a grammar
4. Attach an event handler
5. Load the grammar into the recognizer
6. When the recognizer hears something that fits the grammar, the SpeechRecognized event handler is invoked, which accesses the Result object and works with the recognized text
James A. Larson Developing & Delivering Multimodal Applications 88
SAPI 5.3 & Windows Vista™Dialogusing System;
using System.Windows.Forms;
using System.ComponentModel;
using System.Collections.Generic;
using System.Speech.Recognition;
namespace Reco_Sample_1
{
public partial class Form1 : Form
{
//create a recognizer
SpeechRecognizer _recognizer = new SpeechRecognizer();
public Form1() { InitializeComponent(); }
private void Form1_Load(object sender, EventArgs e)
//Create a pizza grammar
Choices pizzaChoices = new Choices();
pizzaChoices.AddPhrase("I'd like a cheese pizza");
pizzaChoices.AddPhrase("I'd like a pepperoni pizza");
{
pizzaChoices.AddPhrase("I'd like a large pepperoni pizza");
pizzaChoices.AddPhrase(
"I'd like a small thin crust vegetarian pizza");
Grammar pizzaGrammar =
new Grammar(new GrammarBuilder(pizzaChoices));
//Attach an event handler
pizzaGrammar.SpeechRecognized +=
new EventHandler<RecognitionEventArgs>(
PizzaGrammar_SpeechRecognized);
_recognizer.LoadGrammar(pizzaGrammar);
}
void PizzaGrammar_SpeechRecognized(
object sender, RecognitionEventArgs e)
{
MessageBox.Show(e.Result.Text);
}
}
}
James A. Larson Developing & Delivering Multimodal Applications 89
SAPI 5.3 & Windows Vista™References
Speech API Overview
http://msdn2.microsoft.com/en-us/library/ms720151.aspx#API_Speech_Recognition
Microsoft Speech API (SAPI) 5.3
http://msdn2.microsoft.com/en-us/library/ms723627.aspx
“Exploring New Speech Recognition And Synthesis APIs In Windows Vista” by Robert Brown
http://msdn.microsoft.com/msdnmag/issues/06/01/speechinWindowsVista/default.aspx#Resources
James A. Larson Developing & Delivering Multimodal Applications 90
Interaction Manager Approaches
X+V
Interaction Manager(SCXML)
XHTML
VoiceXML 3.0
InkML
W3C
Interaction Manager(XHTML)
VoiceXML 2.0Modules
Interaction Manager
(C#)
SAPI 5.3
Object-oriented
James A. Larson Developing & Delivering Multimodal Applications 91
Step 1: Start with Standard VoiceXML and Standard XHTMLVoiceXML
<form id="topform"> <field name="city"> <prompt>Say a name</prompt> <grammar src="city.grxml"/> </field> </form>
XHTML
<form> Result: <input type="text" name="in1"/> </form>
W3C grammar language
James A. Larson Developing & Delivering Multimodal Applications 92
Step 2: Combine<html xmlns="http://www.w3.org/1999/xhtml">
<head> <form id="topform"> <field name="city"> <prompt>Say a name</vxml:prompt> <grammar src ="city.grxml"/> </field></form></head>
<body <form> Result: <input type="text" name="in1"/> </form></body>
</html>
James A. Larson Developing & Delivering Multimodal Applications 93
Step 3: Insert vxml Namespace
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:vxml="http://www.w3.org/2001/vxml">
<head> <vxml:form id="topform"> <vxml:field name="city"> <vxml:prompt>Say a name</vxml:prompt> <vxml:grammar ="city.grxml"/> </vxml:field> </vxml:form></head>
<body> <form> Result: <input type="text" name="in1"/ </form></body>
</html>
James A. Larson Developing & Delivering Multimodal Applications 94
Step 4: Insert event
<html xmlns=http://www.w3.org/1999/xhtml xmlns:vxml=http://www.w3.org/2001/vxml xmlns:ev="http://www.w3.org/2001/xml-events">
<head> <vxml:form id="topform"> <vxml:field name="city"> <vxml:prompt>Say a name</vxml:prompt> <vxml:grammar src ="city.grxml"/> </vxml:field> </vxml:form></head>
<body <form ev:event="load" ev:handler="#topform"> Result: <input type="text" name="in1"/> </form></body>
</html>
James A. Larson Developing & Delivering Multimodal Applications 95
Step 5: Insert <sync><html xmlns=http://www.w3.org/1999/xhtml xmlns:vxml=http://www.w3.org/2001/vxml xmlns:ev=http://www.w3.org/2001/xml-events xmlns:xv="http://www.w3.org/2002/xhtml+voice">
<head> <xv:sync xv:input="in1" xv:field="#result"/> <vxml:form id="topform"> <vxml:field name="city" xv:id="result"> <vxml:prompt>Say a name</vxml:prompt> <vxml:grammar src ="city.grxml"/> </vxml:field> </vxml:form></head>
<body <form ev:event="load" ev:handler="#topform"> Result: <input type="text" name="in1"/> </form></body>
</html>
James A. Larson Developing & Delivering Multimodal Applications 96
XHTML plus Voice (X+V) References
• Available on– ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003
http://www-306.ibm.com/software/pervasive/ multimodal/?Open&ca=daw-prod-mmb
– Opera Software Multimodal Browser for Sharp Zaurushttp://www-306.ibm.com/software/pervasive/ multimodal/?
Open&ca=daw-prod-mmb– Opera 9 for Windows
http://www.opera.com/
• Programmers Guide– ftp://ftp.software.ibm.com/software/pervasive/info/multimodal /
XHTML_voice_programmers_guide.pdf
• For a variety of small illustrative applications– http://www.larson-tech.com/MM-Projects/Demos.htm
James A. Larson Developing & Delivering Multimodal Applications 97
Exercise 11
Specify the X+V notation for integrating the following VoiceXML and XHTML code by completing the code on the next page
VoiceXML
<form id="stateForm"> <field name="state"> <prompt>Say a state name</prompt> <grammar src="city.grxml"/> </field> </form>
XHTML
<form> Result: <input type="text" name="in1"/> </form>
James A. Larson Developing & Delivering Multimodal Applications 98
Exercise 11 (continued)
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:vxml="http://www.w3.org/2001/vxml" xmlns:ev="http://www.w3.org/2001/xml-events" xmlns:xv="http://www.w3.org/2002/xhtml+voice">
<head> <xv:sync xv:input="_______" xv:field="________"/> <vxml:form id="________"> <vxml:field name="state" xv:id="________“> <vxml:prompt>Say a state name</vxml:prompt> <vxml:grammar src ="state.grxml"/> </vxml:field> </vxml:form></head>
<body <form ev:event="load" ev:handler="#________"> Result: <input type="text" name="_______"/> </form></body>
</html>
James A. Larson Developing & Delivering Multimodal Applications 99
Interaction Manager Approaches
X+V
Interaction Manager(SCXML)
XHTML
VoiceXML 3.0
InkML
W3C
Interaction Manager(XHTML)
VoiceXML 2.0Modules
Interaction Manager
(C#)
SAPI 5.3
Object-oriented
James A. Larson Developing & Delivering Multimodal Applications 100
MMI Architecture—4 Basic Components
• Runtime Framework or Browser— initializes application and interprets the markup
• Interaction Manager—coordinates modality components and provides application flow
• Modality Components—provide modality capabilities such as speech, pen, keyboard, mouse
• Data Model—handles shared data
Interaction Manager (SCXML)
XHTML
VoiceXML 3.0
InkML
DataModel
James A. Larson Developing & Delivering Multimodal Applications 101
Multimodal Architecture and Interfaces
• A loosely-coupled, event-based architecture for integrating multiple modalities into applications
• All communication is event-based
• Based on a set of standard life-cycle events
• Components can also expose other events as required
• Encapsulation protects component data
• Encapsulation enhances extensibility to new modalities
• Can be used outside a Web environment
XHTML
VoiceXML 3.0
InkML
Interaction Manager (SCXML)
DataModel
James A. Larson Developing & Delivering Multimodal Applications 102
Specify Interaction Manager Using Harel State Charts
Extension of state transition systems
• States
• Transitions
• Nested state-transition systems
• Parallel state-transition systems
• History
PrepareState
StartState
WaitState
EndState
FailState
PrepareResponse(success)
StartResponse
DoneSuccess
StartFail
DoneFail
PrepareResponse(fail)
James A. Larson Developing & Delivering Multimodal Applications 103
Example State Transition System
State Chart XML (SCXML)
…
<state id="PrepareState">
<send event="prepare" contentURL="hello.vxml"/>
<transition event="prepareResponse" cond="status='success'" target="StartState"/>
<transition event="prepareResponse" cond="status='failure'" target="FailState"/>
</state>
…
PrepareState
StartState
WaitState
EndState
FailState
PrepareResponse(success)
StartResponse
DoneSuccess
StartFail
DoneFail
PrepareResponse(fail)
James A. Larson Developing & Delivering Multimodal Applications 104
Example State Chart with Parallel States
PrepareVoice
StartVoice
WaitVoice
EndVoice
Fail Voice
PrepareResponse
Success
StartResponse
DoneSuccess
Start Fail
Done Fail
PrepareGUI
StartGUI
WaitGUI
EndGUI
Fail GUI
PrepareResponse
Success
StartResponse
DoneSuccess
Start Fail
Done Fail
PrepareResponseFail
PrepareResponseFail
James A. Larson Developing & Delivering Multimodal Applications 105
The Life Cycle EventsInteractionManager
GUI VUI
prepareprepare
prepareResponse prepareResponse
InteractionManager
GUI VUI
startstart
startResponse startResponse
InteractionManager
GUI VUI
cancelcancel
cancelResponse cancelResponse
InteractionManager
GUI VUI
pausepause
pauseResponse pauseResponse
InteractionManager
GUI VUI
resumeresume
resumeResponse resumeResponse
James A. Larson Developing & Delivering Multimodal Applications 106
More Life Cycle Events
InteractionManager
GUI VUI
newContextRequestnewContextRequest
newContextResponse newContextResponse
InteractionManager
GUIVUI
data data
InteractionManager
GUIdone
InteractionManager
GUI VUI
clearContextclearContext
James A. Larson Developing & Delivering Multimodal Applications 107
Synchronization Using the Lifecycle Data Event
• Intent-based events– Capture the underlying intent
rather than the physical manifestation of user-interaction events
– Independent of the physical characteristics of particular devices
• Data/reset– Reset one or more field values to
null
• Data/focus– Focus on another field
• Data/change– Field value has changed
InteractionManager
GUI VUIdata data
James A. Larson Developing & Delivering Multimodal Applications 108
Interaction Manager
Lifecycle Events between Interaction Manager and Modality
Modality
PrepareState
StartState
WaitState
EndState
FailState
PrepareResponseSuccess)
StartResponse
DoneSuccess
Start Fail
DoneFail
PrepareResponseFail
prepare
prepare response (success)
start
start response (success)
data
done
prepare response (failure)
start response (failure)
James A. Larson Developing & Delivering Multimodal Applications 109
MMI Architecture Principles
• Runtime Framework communicates with Modality Components through asynchronous events
• Modality Components don’t communicate directly with each other, but indirectly through the Runtime Framework
• Components must implement basic life cycle events, may expose other events
• Modality components can be nested (e.g. a Voice Dialog component like a VoiceXML <form>)
• Components need not be markup-based
• EMMA communicates users’ inputs to the Interaction Manager
James A. Larson Developing & Delivering Multimodal Applications 110
Modalities
• GUI Modality (XHTML)– Adapter converts Lifecycle
events to XHTML events– XHTML events converted to
lifecycle events
XHTML
VoiceXML 3.0
Interaction Manager (SCXML)
DataModel
• Voice Modality (VoiceXML 3.0)– Lifecyle events are embedded
into VoiceXML 3.0
James A. Larson Developing & Delivering Multimodal Applications 111
Exercise 12
What should VoiceXML do when it receives each of the following events?
A. Reset
B. Change
C. Focus
James A. Larson Developing & Delivering Multimodal Applications 112
ModalitiesVoiceXML 3.0 will support lifecycle events.
<form> <catch name="change"> <assign name="city" value="data"/> </catch>
…
<field name = "city"> <prompt> Blah </prompt> <grammar src="city.grxml"/> <filled> <send event="data.change" data="city"/> </filled> </field>
</form>
XHTML
VoiceXML 3.0
Interaction Manager (SCXML)
DataModel
James A. Larson Developing & Delivering Multimodal Applications 113
Exercise 13
What should HTML do when it receives each of the following events?
A. Reset
B. Change
C. Focus
James A. Larson Developing & Delivering Multimodal Applications 114
ModalitiesXHTML is extended to support lifecycle eventssent to a modality.
<head>…<ev:Listener ev:event="onChange" ev:observer="app1" ev:handler="onChangeHandler()";>…<script>{function onChangeHandler() post ("data", data="city")}</script></head>
…
<body id="app1"? <input type="text" id=city "value= " "/></body>
…
XHTML
VoiceXML 3.0
Interaction Manager (SCXML)
DataModel
James A. Larson Developing & Delivering Multimodal Applications 115
ModalitiesXHTML is extended to support lifecycle eventssent to the interaction manager
<head>…<handler type="text/javascript“ ev:event="data" if (event="change" {document.app1.city.value="data.city"}</handler>…</head>
…
<body id="app1"? <input type="text" id="city" value=" "/>
</body>…
XHTML
VoiceXML 3.0
Interaction Manager (SCXML)
DataModel
James A. Larson Developing & Delivering Multimodal Applications 116
References
• SCXML– Second working draft available at
http://www.w3.org/TR/2006/WD-scxml-20060124/– Open Source available from
http://jakarta.apache.org/commons/sandbox/scxml/
• Multimodal Architecture and Interfaces – Working draft available at http://www.w3.org/TR/2006/WD-mmi-arch-
20060414/
• Voice Modality– First working draft VoiceXML 3.0 scheduled for November 2007
•XHTML– Full recommendation– Adapters must be hand-coded
• Other modalities– TBD
James A. Larson Developing & Delivering Multimodal Applications 117
Comparison
Object-oriented X+V W3C
Standard Languages SRGS VoiceXML SCXMLSISR SRGS SRGSSSML SSML VoiceXML
SISR SSMLXHTML SISR
XHTMLEMMA
CCXML
Interaction Manager C# XHTML SCXML
Modes GUI GUI GUISpeech Speech Speech
Ink …
James A. Larson Developing & Delivering Multimodal Applications 118
Availability
SAPI 5.3– Microsoft Windows Vista®
X+V – ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003
http://www-306.ibm.com/software/pervasive/multimodal/?Open&ca=daw-prod-mmb
– Opera Software Multimodal Browser for Sharp Zaurushttp://www-306.ibm.com/software/pervasive/
multimodal/?Open&ca=daw-prod-mmb– Opera 9 for Windows
http://www.opera.com/
W3C– First working draft of VoiceXML 3.0 not yet available– Working drafts of SCXML are available; some open-source implementations are
available
Proprietary APIs– Available from vendor
James A. Larson Developing & Delivering Multimodal Applications 119
Discussion Question
Should a developer insert SALT tags or X+V modules into an existing Web page without redesigning the Web page?
James A. Larson Developing & Delivering Multimodal Applications 120
Conclusion
•Multimodal applications offer benefits over today’s traditional GUIs.
•Only use multimodal if there is a clear benefit.
•Standard languages are available today to develop multimodal applications.
•Don’t reinvent the wheel.
•Creativity and lots of usability testing are necessary to create world-class multimodal applications.
James A. Larson Developing & Delivering Multimodal Applications 121
Web Resources
http://www.w3.org/voice
– Specification of grammar, semantic interpretation, and speech synthesis languages
http://www.w3.org/2002/mmi
– Specification of EMMA and InkML languages
http:/www.microsoft.com (and query SALT)
– SALT specification and download instructions for adding SALT to Internet Explorer
http://www-306.ibm.com/software/pervasive/multimodal/
– X+V specification; download Opera and ACCESS browsers
http://www.larson-tech.com/SALT/ReadMeFirst.html
– Student projects using SALT to develop multimodal applications
http://www.larson-tech.com/MMGuide.html or http://www.w3.org/2002/mmi/Group/2006/Guidelines/
– User interface guidelines for multimodal applications
James A. Larson Developing & Delivering Multimodal Applications 122
Working Draft
Recommendation
Status of W3C Multimodal Interface Languages
Proposed Recommendation
CandidateRecommendation
Last CallWorking Draft
Requirements
VoiceXML 2.0
SpeechRecog-nition
GrammarFormat(SRGS)
1.0
SpeechSynthesisMarkup
Language(SSML)
1.0 ExtendedMulti-modal
Interaction(EMMA)
1.0
SemanticInterpret-
ationof
SpeechRecog-nition(SISR)
1.0
StateChartXML
(SCXML)1.0
InkXL1.0
VoiceXML 2.1
James A. Larson Developing & Delivering Multimodal Applications 123
Questions
?
James A. Larson Developing & Delivering Multimodal Applications 124
Answer to Exercise 5
Content- manipulation task
Voice Pen Keyboard/keypad
Mouse/joystick
Select objects
(3) Speak the name of the object
(1) Point to or circle the object
(4) Press keys to position the cursor on the object and press the select key
(2) Point to and click on the object or drag to select text
Enter text (2) Speak the words in the text
(3) Write the text (1) Press keys to spell the words in the text
(4) Spell the text by selecting letters from a soft keyboard
Enter symbols (3) Say the name of the symbol and where it should be placed.
(1) Draw the symbol where it should be placed
(4) Enter one or more characters that together represent the symbol
(2) Select the symbol from a menu and indicate where it should be placed
Enter sketches or illustrations
(2) Verbally describe the sketch or illustration
(1) Draw the sketch or illustration
(4) Impossible (3) Create the sketch by moving the mouse so it leaves a trail (similar to an Etch-a-Sketch™)
James A. Larson Developing & Delivering Multimodal Applications 125
Answer to Exercise 7Write a grammar for zero to nineteen
<grammar type = "application/srgs+xml" root = "zero_to_19" mode = "voice">
<rule id = "zero_to_19"> <one-of> <ruleref uri = "#single_digit"/>
<ruleref uri ="#teens">
</one-of></rule>
<rule id = "single_digit"> <one-of> <item> zero </item> <item> one </item> <item> two </item> <item> three </item> <item> four </item> <item> five </item> <item> six </item> <item> seven </item> <item> eight </item> <item> nine </item> </one-of></rule>
<rule id = "#teens"> <one-of> <item> ten</item> <item> eleven </item> <item> twelve </item> <item> thirteen </item> <item> fourteen </item> <item> fifteen </item> <item> sixteen </item> <item> seventeen </item> <item> eighteen </item> <item> nineteen </item> </one-of> </rule></grammar>
James A. Larson Developing & Delivering Multimodal Applications 126
Answer to Exercise 8
<grammar type = "application/srgs+xml" root = "yes" mode = "voice">
<rule id = "yes"> <one-of> <item> yes </item> <item> sure </item> <item> affirmative </item>
…
</one-of> </rule>
</grammar>
James A. Larson Developing & Delivering Multimodal Applications 127
Answer to Exercise 9
<grammar type = "application/srgs+xml" root = "yes" mode = "voice">
<rule id = "yes"> <one-of> <item> yes </item> <item> sure <tag> out = "yes" </tag> </item> <item> affirmative <tag> out = "yes" </tag> </item>
…
</one-of> </rule>
</grammar>
James A. Larson Developing & Delivering Multimodal Applications 128
Answer to Exercise 10
<interpretation mode = "speech"> <moneyTransfer> <sourceAcct hook="ink"/> <targetAcct hook="ink"/> <amount> 300 </amount> </moneyTransfer></interpretation>
<interpretation mode = "ink"> <moneyTransfer> <sourceAcct> savings </sourceAcct> <targetAcct> checking</targetAcct> </moneyTransfer></interpretation>
Given the following two EMMA specifications, what is the unified EMMA specification?
<interpretation mode = "intp1"> <moneyTransfer> <sourceAcct> savings </sourceAcct> <targetAcct> checking</targetAcct> <amount> 300 </amount> </moneyTransfer></interpretation>
James A. Larson Developing & Delivering Multimodal Applications 129
Answer to Exercise 11
<html xmlns= "http://www.w3.org/1999/xhtml" xmlns:vxml= "http://www.w3.org/2001/vxml" xmlns:ev= "http://www.w3.org/2001/xml-events" xmlns:xv="http://www.w3.org/2002/xhtml+voice">
<head> <xv:sync xv:input="in4" xv:field="#answer"/> <vxml:form id= "stateForm"> <vxml:field name= "state" xv:id= "answer"> <vxml:prompt>Say a state name</vxml:prompt> <vxml:grammar src = "state.grxml"/> </vxml:field> </vxml:form></head>
<body <form ev:event="load" ev:handler="#stateForm"> Result: <input type="text" name="in4"/> </form></body>
</html>
James A. Larson Developing & Delivering Multimodal Applications 130
Exercise 12
What should HTML do when it receives each of the following events?
• Reset– Reset the value
• Change– Change the value
• Focus– Prompt for the value now in focus
James A. Larson Developing & Delivering Multimodal Applications 131
Exercise 13
What should HTML do when it receives each of the following events?
• Reset– Reset the value– Author decides if cursor should be moved to the reset value
• Change– Change the value– Author decides if cursor should be moved to the reset value
• Focus– Move the cursor to the item in focus