CMPUT 301: Lecture 31 Out of the Glass Box Martin Jagersand Department of Computing Science...

CMPUT 301: Lecture 31Out of the Glass Box

Martin Jagersand

Department of Computing ScienceUniversity of Alberta

Overview

• Idea:– why only use the sense of vision in user

interfaces?– increase the bandwidth of the interaction by

using multiple sensory channels, instead of overloading the visual channel

Overview

• Multi-sensory systems:– use more than one sensory channel in

interaction– e.g., sound, video, gestures, physical actions

Overview

• Usable senses:– sight, sound, touch, taste, smell, – Haptics, proprioception and accelerations– each is important on its own– together, they provide a fuller interaction with

the natural world

Overview

• Usable senses:– computers rarely offer such a rich interaction– we can use sight, sound, and sometimes touch– Flight simulators and some games uses

accelerations to create a multimodal immersion experience.

– we cannot (yet) use taste or smell

Overview

• Multi-modal systems:– use more than one sense in the interaction– e.g., sight and sound: a word processor that

speaks the words as well as rendering them on the screen

Overview

• Multi-media systems:– use a number of different media to

communicate information– e.g., a computer-based teaching system with

video, animation, text, and still images

Speech

• Human speech:– natural mastery of language– instinctive, taken for granted– difficult to appreciate the complexities– potentially a useful way to extend human-

computer interaction

Speech

• Structure:– phonemes (English)

– 40 (24 consonant and 16 vowel sounds)

– basic atomic units of speech

– sound slightly different depending on context …

Speech

• Structure:– allophones:

– 120 to 130

– all the sounds in the language

– count depends on accents

Speech

• Structure:– morphemes

– basic atomic units of language

– part or whole words

– formed into sentences using the rules of grammar

Speech

• Prosody:– variations in emphasis, stress, pauses, and pitch

to impart more meaning to sentences

• Co-articulation:– the effect of context on the sound– transforms phonemes into allophones

Speech Recognition

• Problems:– different people speak differently

(e.g., accent, stress, volume, etc.)– background noises– “ummm …” and “errr …”– speech may conflict with complex cognition

Speech Recognition

• Issues:– recognizing words is not enough– need to extract meaning– understanding a sentence requires context, such

as information about the subject and the speaker

Speech Recognition

• Phonetic typewriter:– developed for Finnish

(a phonetic language)– trained on one speaker, tries to generalize to others– uses neural network that clusters similar sounds

together, for a character– poor performance on speakers it has not been trained

on– requires a large dictionary of minor variations

Speech Recognition

• Currently:– single user, limited vocabulary systems can

work satisfactorily– no general user, general vocabulary systems are

commercial successful, yet

• Current commercial examples:– Simple telephone based UI such as Train

schedule information systems

Speech Recognition

• Potential:– for users with physical disabilities– for lightweight, mobile devices– for when user’s hands are already occupied

with a manual task (auto mechanic, surgeon)

Speech Synthesis

• What:– computer-generated speech– natural and familiar way of receiving

information

Speech Synthesis

• Problems:– human find it difficult to adjust to monotonic,

non-prosodic speech– computer needs to understand natural language

and the domain– Speech is transient

(hard to review or browse)– produces noise in the workplace or requires

headphones(intrusive)

Speech Synthesis

• Potential:– screen readers

– read a textual display to a visually impaired person

– warning signals– spoken information especially for aircraft pilots whose

visual and haptic channels are busy

Speech Synthesis

• Virtual newscaster (Ananova)

Uninterpreted Speech

• What:– fixed, recorded speech– e.g., played back in airport announcements– e.g., attached as voice annotation to files

Uninterpreted Speech

• Digital processing:– change playback speed without changing pitch

– to quickly scan phone messages

– to manually transcribe voice to text

– to figure out the lyrics and chords of a song

– spatialization and environmental effects

Non-Speech Sound

• What:– boings, bangs, squeaks, clicks, etc.– commonly used in user interfaces to provide

warnings and alarms

Non-Speech Sound

• Why:– fewer typing mistakes with key clicks– video games harder without sound

Non-Speech Sound?

• D’oh!

Non-Speech Sound

• Dual mode displays:– information presented along two different

sensory channels– e.g., sight and sound

– allows for redundant presentation– user uses whichever they find easiest

– allows for resolution of ambiguity in one mode through information in the other

Non-Speech Sound

• Dual mode displays:– humans can react faster to auditory than visual

stimuli– sound is especially good for transient

information that would otherwise clutter a visual display

– sound is more language and culture independent (unlike speech)

Non-Speech Sound

• Auditory icons:– use natural sounds to represent different types of

objects and actions in the user interface– e.g., breaking glass sound when deleting a file

– direction and volume of sounds can indicate position and importance/size

– SonicFinder

– not all actions have an intuitive sound

Non-Speech Sound

• Earcons:– synthetic sounds used to convey information– structured combinations of motives (musical

notes) to provide rich information

Non-Speech Sound

• Earcons:

Handwriting Recognition

• Handwriting:– text and graphic input– complex strokes and spaces– natural

• Problems:– variation in handwriting between users– variation from day to day and over years for a

single user– variation of letters depending on nearby letters

• Currently:– limited success with systems trained on a few

users, with separated letters– generic, multi-user, cursive text recognition

systems are not accurate enough to be commercially successful

• Current applications e.g. pre-sorting of mail (but human has to assist with failures)

• Newton:– printing or cursive

writing recognition

– dictionary of words

– contextual recognition

– fine tune spacing and letter shapes

– fine tune recognition speed

– learn handwriting over time

• Newton:

• What did I learn today?

• What questions do I still have?

CMPUT 301: Lecture 31 Out of the Glass Box Martin Jagersand Department of Computing Science...

Documents

1 Machine Learning CMPUT 466/551 Nilanjan Ray Department of Computing Science University of Alberta

Computer Vision cmput 428/615 Lecture 8: 3D projective geometry and it’s applications 3D projective geometry and it’s applications Martin Jagersand

Computer Vision cmput 499 Lecture 1: Introduction and course overview Martin Jagersand

CMPUT 301: Lecture 25 Graphic Design Lecturer: Martin Jagersand Department of Computing Science University of Alberta Notes based on previous courses by

Image-based Control Convergence issues CMPUT 610 Winter 2001 Martin Jagersand

Robot Kinematics and linkage configurations 2 cmput 412 M. Jagersand With slides from A. Shademan and A Casals

Robotic Hand-Eye Systems CMPUT 610 Martin Jagersand

Detecting Motion and Optic Flow Cmput 610 Martin Jagersand

CMPUT 301: Lecture 35 Computer Assisted Collaboration Martin Jagersand Department of Computing Science University of Alberta Based on notes from Pierre

CMPUT 301: Lecture 27 Help and Documentation Martin Jagersand Department of Computing Science University of Alberta

Computer Vision cmput 499/615 Basic 2D and 3D geometry and Camera models Martin Jagersand

CMPUT 391 – Database Management Systems Department of Computing Science University of Alberta CMPUT 391 Database Management Systems JavaServer Pages (JSP)

Lecture 03© Vadim Bulitko : CMPUT 272, Winter 2004, UofA1 CMPUT 272 Formal Systems & Logic in CS Vadim Bulitko University of Alberta bulitko/W04

Optic Flow and Motion Detection Cmput 498/613 Martin Jagersand Readings: Papers and chapters on 613 web page

CMPUT 301: Lecture 12 The Human Lecturer: Martin Jagersand Department of Computing Science University of Alberta Notes based on previous courses by Ken

Computer Vision cmput 499/615 Lecture 8: 3D projective geometry and it’s applications 3D projective geometry and it’s applications Martin Jagersand

CMPUT 301: Lecture 18 Usability Paradigms and Principles Lecturer: Martin Jagersand Department of Computing Science University of Alberta Notes based on

CMPUT 301: Lecture 15 Task Analysis Lecturer: Martin Jagersand Department of Computing Science University of Alberta Notes based on previous courses by

CMPUT 412 Experimental Mobile Robotics Csaba Szepesvári University of Alberta

Sept 11, 2003© Vadim Bulitko : CMPUT 272, Fall 2003, UofA1 CMPUT 272 Formal Systems & Logic in CS I. E. Leonard University of Alberta isaac/cmput272/s03