View
218
Download
0
Category
Tags:
Preview:
Citation preview
CMPUT 301: Lecture 31Out of the Glass Box
Martin Jagersand
Department of Computing ScienceUniversity of Alberta
2
Overview
• Idea:– why only use the sense of vision in user
interfaces?– increase the bandwidth of the interaction by
using multiple sensory channels, instead of overloading the visual channel
3
Overview
• Multi-sensory systems:– use more than one sensory channel in
interaction– e.g., sound, video, gestures, physical actions
etc.
4
Overview
• Usable senses:– sight, sound, touch, taste, smell, – Haptics, proprioception and accelerations– each is important on its own– together, they provide a fuller interaction with
the natural world
5
Overview
• Usable senses:– computers rarely offer such a rich interaction– we can use sight, sound, and sometimes touch– Flight simulators and some games uses
accelerations to create a multimodal immersion experience.
– we cannot (yet) use taste or smell
6
Overview
• Multi-modal systems:– use more than one sense in the interaction– e.g., sight and sound: a word processor that
speaks the words as well as rendering them on the screen
7
Overview
• Multi-media systems:– use a number of different media to
communicate information– e.g., a computer-based teaching system with
video, animation, text, and still images
8
Speech
• Human speech:– natural mastery of language– instinctive, taken for granted– difficult to appreciate the complexities– potentially a useful way to extend human-
computer interaction
9
Speech
• Structure:– phonemes (English)
– 40 (24 consonant and 16 vowel sounds)
– basic atomic units of speech
– sound slightly different depending on context …
10
Speech
• Structure:– allophones:
– 120 to 130
– all the sounds in the language
– count depends on accents
11
Speech
• Structure:– morphemes
– basic atomic units of language
– part or whole words
– formed into sentences using the rules of grammar
12
Speech
• Prosody:– variations in emphasis, stress, pauses, and pitch
to impart more meaning to sentences
• Co-articulation:– the effect of context on the sound– transforms phonemes into allophones
13
Speech Recognition
• Problems:– different people speak differently
(e.g., accent, stress, volume, etc.)– background noises– “ummm …” and “errr …”– speech may conflict with complex cognition
14
Speech Recognition
• Issues:– recognizing words is not enough– need to extract meaning– understanding a sentence requires context, such
as information about the subject and the speaker
15
Speech Recognition
• Phonetic typewriter:– developed for Finnish
(a phonetic language)– trained on one speaker, tries to generalize to others– uses neural network that clusters similar sounds
together, for a character– poor performance on speakers it has not been trained
on– requires a large dictionary of minor variations
16
Speech Recognition
• Currently:– single user, limited vocabulary systems can
work satisfactorily– no general user, general vocabulary systems are
commercial successful, yet
• Current commercial examples:– Simple telephone based UI such as Train
schedule information systems
17
Speech Recognition
• Potential:– for users with physical disabilities– for lightweight, mobile devices– for when user’s hands are already occupied
with a manual task (auto mechanic, surgeon)
18
Speech Synthesis
• What:– computer-generated speech– natural and familiar way of receiving
information
19
Speech Synthesis
• Problems:– human find it difficult to adjust to monotonic,
non-prosodic speech– computer needs to understand natural language
and the domain– Speech is transient
(hard to review or browse)– produces noise in the workplace or requires
headphones(intrusive)
20
Speech Synthesis
• Potential:– screen readers
– read a textual display to a visually impaired person
– warning signals– spoken information especially for aircraft pilots whose
visual and haptic channels are busy
22
Uninterpreted Speech
• What:– fixed, recorded speech– e.g., played back in airport announcements– e.g., attached as voice annotation to files
23
Uninterpreted Speech
• Digital processing:– change playback speed without changing pitch
– to quickly scan phone messages
– to manually transcribe voice to text
– to figure out the lyrics and chords of a song
– spatialization and environmental effects
24
Non-Speech Sound
• What:– boings, bangs, squeaks, clicks, etc.– commonly used in user interfaces to provide
warnings and alarms
27
Non-Speech Sound
• Dual mode displays:– information presented along two different
sensory channels– e.g., sight and sound
– allows for redundant presentation– user uses whichever they find easiest
– allows for resolution of ambiguity in one mode through information in the other
28
Non-Speech Sound
• Dual mode displays:– humans can react faster to auditory than visual
stimuli– sound is especially good for transient
information that would otherwise clutter a visual display
– sound is more language and culture independent (unlike speech)
29
Non-Speech Sound
• Auditory icons:– use natural sounds to represent different types of
objects and actions in the user interface– e.g., breaking glass sound when deleting a file
– direction and volume of sounds can indicate position and importance/size
– SonicFinder
– not all actions have an intuitive sound
30
Non-Speech Sound
• Earcons:– synthetic sounds used to convey information– structured combinations of motives (musical
notes) to provide rich information
32
Handwriting Recognition
• Handwriting:– text and graphic input– complex strokes and spaces– natural
33
Handwriting Recognition
• Problems:– variation in handwriting between users– variation from day to day and over years for a
single user– variation of letters depending on nearby letters
34
Handwriting Recognition
• Currently:– limited success with systems trained on a few
users, with separated letters– generic, multi-user, cursive text recognition
systems are not accurate enough to be commercially successful
• Current applications e.g. pre-sorting of mail (but human has to assist with failures)
35
Handwriting Recognition
• Newton:– printing or cursive
writing recognition
– dictionary of words
– contextual recognition
– fine tune spacing and letter shapes
– fine tune recognition speed
– learn handwriting over time
Recommended