Guidelines for multimodal user interface design

COMMUNICATIONS OF THE ACM January 2004/Vol. 47, No. 1 57

User Interface DesignGUIDELINES FOR MULTIMODAL

of multimodal systems from the groundup (see articles by Cohen and McGee andPieraccini et al. in this section). This arti-cle discusses six main categories of guide-lines and represents a preliminary effort toestablish principles for multimodal inter-action design. A more detailed discussionof these guidelines will be available in [6].

Requirements Specification. Criticalto the design of any application are theuser requirements and system capabilitiesfor the given domain. Here, we providesome general considerations for multi-modal system requirements specification.

Design for broadest range of users andcontexts of use. Designers should becomefamiliar with users’ psychological charac-teristics (for example, cognitive abilities,motivation), level of experience, domainand task characteristics, cultural back-ground, as well as their physical attributes(for example, age, vision, hearing). An

application will be valued and accepted ifit can be used by a wide population andin more than one manner. Thus, multi-modal designs can aid in extending therange of potential users and uses, such aswhen redundancy of speech and keypadinput enables an application to be used indark and/or noisy environments. Design-ers should support the best modality orcombination of modalities anticipated inchanging environments (for example, pri-vate office vs. driving a car).

Address privacy and security issues. Usersshould be recognized by an interface onlyaccording to their explicit preference andnot be remembered by default. In situa-tions where users wish to maintain pri-vacy by avoiding speech input or output,multimodal interfaces that use speechshould also provide a non-speech modeto prevent others from overhearing pri-vate conversations. Non-speech alterna-

In today’s pursuit of more transparent, flexible, and efficient

human-computer interaction, a growing interest in multimodal

interface design has emerged [5]. The goals are twofold: to

achieve an interaction closer to natural human-human communi-

cation, and to increase the robustness of the interaction by using

redundant or complementary information. New interaction para-

digms and guidelines are necessary to facilitate the design

58 January 2004/Vol. 47, No. 1 COMMUNICATIONS OF THE ACM

tives should also be provided when users enter per-sonal identification numbers, passwords (for example,automatic bank teller), or when they might beuncomfortable if certain private information is over-heard by others. For example, to reduce the likelihoodof others being aware of a user’s mistakes, it may bepreferable to provide error messages in a visual forminstead of audible speech.

Designing Multimodal Input and Output. Thecognitive science literature on intersensory perceptionand intermodal coordination has provided a founda-tion for determining multimodal design principles [2,5, 7]. To optimize human performance in multi-modal systems, such principles can be used to directthe design of information presented to users, specifi-cally regarding how to integrate multiple modalitiesor how to support multiple user inputs (for example,voice and gesture). Here, we provide a brief summaryof some general guiding principles essential to thedesign of effective multimodal interaction.

Maximize human cognitive and physical abilities.Designers need to determine how to support intu-itive, streamlined interactions based on users’ humaninformation processing abilities (including attention,working memory, and decision making) for example:

• Avoid unnecessarily presenting information intwo different modalities in cases where the usermust simultaneously attend to both sources tocomprehend the material being presented [1, 3];such redundancy can increase cognitive load atthe cost of learning the material [1].

• Maximize the advantages of each modality toreduce user’s memory load in certain tasks andsituations, as illustrated by these modality combi-nations [7, 8]:

˚ System visual presentation coupled with usermanual input for spatial information and par-allel processing;

˚ System auditory presentation coupled with userspeech input for state information, serial pro-cessing, attention alerting, or issuing commands.

Integrate modalities in a manner compatible withuser preferences, context, and system functionality. Addi-tional modalities should be added to the system onlyif they improve satisfaction, efficiency, or otheraspects of performance for a given user and context.When using multiple modalities:

• Match output to acceptable user input style (forexample, if the user is constrained by a set gram-mar, do not design a virtual agent to use uncon-strained natural language);

• Use multimodal cues to improve collaborativespeech (for example, a virtual agent’s gaze direc-tion or gesture can guide user turn-taking);

• Ensure system output modalities are well syn-chronized temporally (for example, map-baseddisplay and spoken directions, or virtual displayand non-speech audio);

• Ensure the current system interaction state isshared across modalities and that appropriateinformation is displayed in order to support:

˚ Users in choosing alternative interaction modalities;

˚ Multidevice and distributed interaction;

˚ System capture of a user’s interaction history.

Adaptivity. Multimodal interfaces should adapt tothe needs and abilities of different users, as well as dif-ferent contexts of use. Dynamic adaptivity enables theinterface to degrade gracefully by leveraging comple-mentary and supplementary modalities according tochanges in task and context. Individual differences(for example, age, preferences, skill, sensory or motorimpairment) can be captured in a user profile andused to determine interface settings such as:

• Allowing gestures to augment or replace speechinput in noisy environments, or for users withspeech impairments;

• Overcoming bandwidth constraints (for example,local direct manipulation replaces gaze input thatmust be analyzed remotely);

• Adapting the quantity and method of informa-tion presentation to both the user and displaydevice.

Consistency. Presentation and prompts shouldshare common features as much as possible andshould refer to a common task including using thesame terminology across modalities. Additionalguidelines include providing consistent:

• System output independent of varying inputmodalities (for example, the same keyword pro-vides identical results whether user searches bytyping or speaking);

• Interactions of combined modalities across appli-cations (for example, consistently enable short-cuts);

• System-initiated or user-initiated state switching(for example, mode changing), by ensuring theuser’s interaction choices are seamlessly detectedand that the system appropriately provides feed-back when it initiates a modality change.

Feedback. Users should be aware of their current

COMMUNICATIONS OF THE ACM January 2004/Vol. 47, No. 1 59

connectivity and know which modalities are availableto them. They should be made aware of alternativeinteraction options without being overloaded bylengthy instructions that distract from the task. Spe-cific examples include using descriptive icons (forexample, microphone and speech bubbles to denoteclick-to-talk buttons), and notifying users to beginspeaking if speech recognition starts automatically.Also, confirm system interpretations of whole userinput after fusion has taken place [4], rather than foreach modality in isolation.

Error Prevention/Handling. User errors can beminimized and error handling improved by providingclearly marked exits from a task, modality, or theentire system, and by easily allowing users to undo aprevious action or command. To further prevent usersfrom guessing at functionality and making mistakes,designers should provide concise and effective help inthe form of task-relevant and easily accessible assis-tance. Some specific examples include [5]:

• Integrate complementary modalities in order toimprove overall robustness during multimodalfusion, thereby enabling the strengths of each toovercome weaknesses in others;

• Give users control over modality selection, so theycan use a less error-prone modality for given lexi-cal content;

• If an error occurs, permit users to switch to a dif-ferent modality;

• Incorporate modalities capable of conveying richsemantic information, rather than just pointingor selection;

• Fuse information from multiple heterogeneoussources of information (that is, cast a broad“information net”);

• Develop multimodal processing techniques thattarget brief or otherwise ambiguous information,and are designed to retain information.

ConclusionThe guiding principles presented here represent ini-tial strategies to aid in the development of principle-driven multimodal interface guidelines. In order todevelop both innovative and optimal future multi-modal interfaces, additional empirical studies are

needed to determine the most intuitive and effectivecombinations of input and output modalities for dif-ferent users, applications and usage contexts, as wellas how and when to best integrate those modalities.To fully capitalize on the robustness and flexibility ofmultimodal interfaces, further work also needs toexplore new techniques for error handling and adap-tive processing, and then to translate these findingsinto viable and increasingly specific multimodalinterface guidelines for the broader community.

References1. Cooper, G. Research in to Cognitive Load Theory and Instructional Design

at UNSW (1997); www.arts.unsw.edu.au/education/CLT_NET_Aug_97.HTML.2. European Telecommunications Standards Institute. Human factors:

Guidelines on the multimodality of icons, symbols, and pictograms.(Report No. ETSI EG 202 048 v 1.1.1 (2002-08). ETSI, Sophia Antipo-lis, France.

3. Kalyuga, S., Chandler, P., and Sweller, J., (1999). Managing split-atten-tion and redundancy in multimedia instruction. Applied Cognitive Psy-chology 13 (1999), 351–371.

4. McGee, D.R., Cohen, P.R., and Oviatt, S. Confirmation in multimodalsystems. In Proceedings of the International Joint Conference of the Associ-ation for Computational Linguistics and the International Committee onComputational Linguistics (Montreal, Quebec, Canada, 1998).

5. Oviatt, S.L. Multimodal interfaces. The Human-Computer InteractionHandbook: Fundamentals, Evolving Technologies and Emerging Applica-tions. J. Jacko and A. Sears, Eds. Lawrence Erlbaum, Mahwah, NJ, 2003,286–304.

6. Reeves, L.M., Lai, J. et al. Multimodal interaction design: From alessons-learned approach to developing principle-driven guidelines.International J. of HCI. Forthcoming.

7. Stanney, K.M., Samman, S., Reeves, L.M., Hale, K., Buff, W., Bowers,C., Goldiez, B., Lyons-Nicholson, D., and Lackey, S. A paradigm shiftin interactive computing: Deriving multimodal design principles frombehavioral and neurological foundations. In review.

8. Wickens, C.D. Engineering Psychology and Human Performance (2nd ed).Harper Collins, NY, 1992.

Leah M. Reeves, Jennifer Lai, James A. Larson([email protected]), Sharon Oviatt([email protected]), T.S. Balaji, Stéphanie Buisine,Penny Collings, Phil Cohen, Ben Kraal, Jean-Claude Martin, Michael McTear, TV Raman,Kay M. Stanney, Hui Su, and QianYing Wang.

This article was originally drafted among the co-authors as part of a CHI’03 workshopon multimodal interaction and interface design principles organized by James Larsonand Sharon Oviatt.

Permission to make digital or hard copies of all or part of this work for personal or class-room use is granted without fee provided that copies are not made or distributed forprofit or commercial advantage and that copies bear this notice and the full citation onthe first page. To copy otherwise, to republish, to post on servers or to redistribute tolists, requires prior specific permission and/or a fee.

© 2004 ACM 0002-0782/04/0100 $5.00

c

MULTIMODAL interfaces should adapt to the needs and abilities of different users, as well as different contexts of use. Dynamic adaptivity enablesthe interface to degrade gracefully by leveraging complementary and supplementary modalities according to changes in task and context.

Documents

Guidelines for multimodal user interface design