2
Designing a reconfigurable Multimodal and Collaborative Supervisor for Virtual Environment Pierre Martin Patrick Bourdot V&AR VENISE team, CNRS/LIMSI, B.P 133, 91403 Orsay (France) ABSTRACT Virtual Reality (VR) systems cannot be promoted for complex applications (involving the interpretation of massive and intricate databases) without creating natural and ”transparent” user inter- faces: intuitive interfaces are required to bring non-expert users to use VR technologies. Many studies have been carried out on multimodal and collaborative systems in VR. Although these two aspects are usually studied separately, they share interesting simi- larities. Our work focuses on the way to manage multimodal and collaborative interactions in a same process. We present here the similarities between these two processes and the main features of a reconfigurable multimodal and collaborative supervisor for Virtual Environments (VEs). The aim of such a system is to ensure the merge of pieces of information coming from VR devices (tracking, gestures, speech, haptics, etc.), to control immersive multi-user ap- plications using the main communication and sensorimotor chan- nels of humans. The framework’s architecture of this supervisor wants to be generic, modular and reconfigurable (via an XML con- figuration file), in order to be applied to many different contexts. Index Terms: H.1.2 [Models and Principles]: User/Machines Systems—Human information processing; H.5.1 [Information In- terfaces and Presentation]: Multimedia Information Systems— Artificial, augmented, and virtual realities; H.5.2 [Information In- terfaces and Presentation]: MUser Interfaces—User interface man- agement systems; I.3.6 [Computer Graphics]: Methodology and Techniques—Interaction techniques 1 I NTRODUCTION Efforts are still needed to study the use of Virtual Environments (VEs) for complex applications (product design, data exploration, etc.). But VEs cannot be promoted for such applications without creating natural and ”transparent” user interfaces: intuitive inter- faces could bring non-expert users to use Virtual Reality (VR) tech- nologies, by exploiting human modalities. Moreover, these modal- ities have to be simultaneously perceived and interpreted, hence the need for multimodal interfaces in VEs. Multimodality has been a growing field of Computer Science since the eighties and impor- tant concepts have been addressed, especially by Bolt [2], Oviatt et al. [7], Martin [5], Bellik [1] and Latoschik [3]. Additionally, managing groups of users in Collaborative VEs (CVEs), involved in collaborative tasks is more and more required (e.g. Margery et al. [4] and Salzmann et al. [8]). Many studies focus on multi- modal or collaborative interactions in Virtual Environments, but to our knowledge none covers these topics in conjunction. This is what motivated the present work, further described in [6]. e-mail: [email protected] e-mail: [email protected] 2 SIMILARITIES BETWEEN MULTIMODAL AND COLLABORA- TIVE INTERACTIONS Several critical points of the multimodal process can be inferred from studies on multimodal interaction: disambiguation (compar- ing user’s inputs), decision (generating actions desired by a user) and dialog (decision management, identifying additional treatments to be applied on incoming actions). Previous multimodal ap- proaches have been carried out on single user and non-cooperative scenarios. But now, the improvements of VR technology allow, for instance, cooperation of co-located users into a shared virtual scene. Therefore the multi-user context is a key point in the multimodal process. Such a system must be able to handle interactions from several users and thus one must know when data fusion may occur, but also what data can be merged. What happens if pieces of infor- mation coming from several users have to be combined ? At which level, the pieces of information have to be merged ? These new im- portant questions about multi-user multimodal systems reveal that there are similarities between collaborative and multimodal pro- cesses. We should not restrict the notion of collaboration only to simultaneous activities, but we need a broader view including the notions of dialog and continuity. It is not solely a question of solving some constraints at a given time, but rather of an ongoing management over time. Thus, to bring closer these two processes, we have chosen to extend the classification for collaborative inter- actions proposed by [4]. Actually, the previous classification of Margery et al. was mainly focusing on user’s collaborative manip- ulations. Our extended classification is task-based oriented, which is more appropriate to a multimodal approach: • Basic cooperation (level 1) : users can perceive each other and can communicate. This level is composed of two general situations: (a) co-located interaction : users are immersed in the same place or in the same display. Depending on the technology used (full or limited cohabitation), they can have natural com- munication. For instance, in a multi-user CAVE with multi- stereoscopy visualization system, users have a full cohabita- tion. They can see or touch each other and can have natural dialog. In case of HMD systems, users have a limited cohab- itation. They are able to have natural conversations, but they do not see others physically. (b) remote interactions : users are immersed in a virtual world but are distant (virtual presence). Therefore, multimedia com- munication and avatars are required. • Parallel tasks (level 2) : the notion of task includes manipula- tion of scene’s objects (cf. Margery’s classification) but also command aspect to control the application. Here, the users can act on the scene individually: a given task is performed by only one user. This level is divided in two subdivisions, (a) constrained tasks (by scene design) and (b) free tasks. Users’ interactions are completely independent of each other, what- ever the different processes. • Cooperative tasks (level 3) : users can cooperate on the same 225 IEEE Virtual Reality 2011 19 - 23 March, Singapore 978-1-4577-0038-5/11/$26.00 ©2011 IEEE

[IEEE 2011 IEEE Virtual Reality (VR) - Singapore, Singapore (2011.03.19-2011.03.23)] 2011 IEEE Virtual Reality Conference - Designing a reconfigurable multimodal and collaborative

  • Upload
    patrick

  • View
    217

  • Download
    3

Embed Size (px)

Citation preview

Page 1: [IEEE 2011 IEEE Virtual Reality (VR) - Singapore, Singapore (2011.03.19-2011.03.23)] 2011 IEEE Virtual Reality Conference - Designing a reconfigurable multimodal and collaborative

Designing a reconfigurable Multimodal and Collaborative Supervisorfor Virtual Environment

Pierre Martin∗ Patrick Bourdot†

V&AR VENISE team, CNRS/LIMSI, B.P 133, 91403 Orsay (France)

ABSTRACT

Virtual Reality (VR) systems cannot be promoted for complexapplications (involving the interpretation of massive and intricatedatabases) without creating natural and ”transparent” user inter-faces: intuitive interfaces are required to bring non-expert usersto use VR technologies. Many studies have been carried out onmultimodal and collaborative systems in VR. Although these twoaspects are usually studied separately, they share interesting simi-larities. Our work focuses on the way to manage multimodal andcollaborative interactions in a same process. We present here thesimilarities between these two processes and the main features of areconfigurable multimodal and collaborative supervisor for VirtualEnvironments (VEs). The aim of such a system is to ensure themerge of pieces of information coming from VR devices (tracking,gestures, speech, haptics, etc.), to control immersive multi-user ap-plications using the main communication and sensorimotor chan-nels of humans. The framework’s architecture of this supervisorwants to be generic, modular and reconfigurable (via an XML con-figuration file), in order to be applied to many different contexts.

Index Terms: H.1.2 [Models and Principles]: User/MachinesSystems—Human information processing; H.5.1 [Information In-terfaces and Presentation]: Multimedia Information Systems—Artificial, augmented, and virtual realities; H.5.2 [Information In-terfaces and Presentation]: MUser Interfaces—User interface man-agement systems; I.3.6 [Computer Graphics]: Methodology andTechniques—Interaction techniques

1 INTRODUCTION

Efforts are still needed to study the use of Virtual Environments(VEs) for complex applications (product design, data exploration,etc.). But VEs cannot be promoted for such applications withoutcreating natural and ”transparent” user interfaces: intuitive inter-faces could bring non-expert users to use Virtual Reality (VR) tech-nologies, by exploiting human modalities. Moreover, these modal-ities have to be simultaneously perceived and interpreted, hence theneed for multimodal interfaces in VEs. Multimodality has been agrowing field of Computer Science since the eighties and impor-tant concepts have been addressed, especially by Bolt [2], Oviattet al. [7], Martin [5], Bellik [1] and Latoschik [3]. Additionally,managing groups of users in Collaborative VEs (CVEs), involvedin collaborative tasks is more and more required (e.g. Margery etal. [4] and Salzmann et al. [8]). Many studies focus on multi-modal or collaborative interactions in Virtual Environments, but toour knowledge none covers these topics in conjunction. This iswhat motivated the present work, further described in [6].

∗e-mail: [email protected]†e-mail: [email protected]

2 SIMILARITIES BETWEEN MULTIMODAL AND COLLABORA-TIVE INTERACTIONS

Several critical points of the multimodal process can be inferredfrom studies on multimodal interaction: disambiguation (compar-ing user’s inputs), decision (generating actions desired by a user)and dialog (decision management, identifying additional treatmentsto be applied on incoming actions). Previous multimodal ap-proaches have been carried out on single user and non-cooperativescenarios. But now, the improvements of VR technology allow, forinstance, cooperation of co-located users into a shared virtual scene.Therefore the multi-user context is a key point in the multimodalprocess. Such a system must be able to handle interactions fromseveral users and thus one must know when data fusion may occur,but also what data can be merged. What happens if pieces of infor-mation coming from several users have to be combined ? At whichlevel, the pieces of information have to be merged ? These new im-portant questions about multi-user multimodal systems reveal thatthere are similarities between collaborative and multimodal pro-cesses. We should not restrict the notion of collaboration onlyto simultaneous activities, but we need a broader view includingthe notions of dialog and continuity. It is not solely a question ofsolving some constraints at a given time, but rather of an ongoingmanagement over time. Thus, to bring closer these two processes,we have chosen to extend the classification for collaborative inter-actions proposed by [4]. Actually, the previous classification ofMargery et al. was mainly focusing on user’s collaborative manip-ulations. Our extended classification is task-based oriented, whichis more appropriate to a multimodal approach:

• Basic cooperation (level 1) : users can perceive each otherand can communicate. This level is composed of two generalsituations:

(a) co-located interaction : users are immersed in the sameplace or in the same display. Depending on the technologyused (full or limited cohabitation), they can have natural com-munication. For instance, in a multi-user CAVE with multi-stereoscopy visualization system, users have a full cohabita-tion. They can see or touch each other and can have naturaldialog. In case of HMD systems, users have a limited cohab-itation. They are able to have natural conversations, but theydo not see others physically.

(b) remote interactions : users are immersed in a virtual worldbut are distant (virtual presence). Therefore, multimedia com-munication and avatars are required.

• Parallel tasks (level 2) : the notion of task includes manipula-tion of scene’s objects (cf. Margery’s classification) but alsocommand aspect to control the application. Here, the userscan act on the scene individually: a given task is performedby only one user. This level is divided in two subdivisions, (a)constrained tasks (by scene design) and (b) free tasks. Users’interactions are completely independent of each other, what-ever the different processes.

• Cooperative tasks (level 3) : users can cooperate on the same

225

IEEE Virtual Reality 2011

19 - 23 March, Singapore

978-1-4577-0038-5/11/$26.00 ©2011 IEEE

Page 2: [IEEE 2011 IEEE Virtual Reality (VR) - Singapore, Singapore (2011.03.19-2011.03.23)] 2011 IEEE Virtual Reality Conference - Designing a reconfigurable multimodal and collaborative

Figure 1: The framework’s architecture. The Multimodal and Collaborative Supervisor (MCS) is configured and initialized using XML files. MCSprocesses users’ inputs, taking into account rules and context, and transmits results to the application.

object or within the same task. This level is divided in threesub-divisions:

(a) independent tasks : users generate tasks with a similar tar-get (but independent properties) which can be performed in-dependently of each other.

(b) synergistic task : users’ interactions can be combined, atany level of treatment, to generate one single task. This con-cept includes the previous co-manipulations and is close to thecomplementarity concept of Martin [5].

(c) co-dependent task : users generate dependent tasks (i.e.similar task or competitive task on a same target). These co-dependent tasks can not be performed immediately becauseambiguities have to be solved. With the ”similar tasks” comesa concept of redundancy, also close to the one of Martin [5].

It is obvious that level 1 of collaboration does not depend on thesystem used but rather on technology. But we have also noticed thatlevel 2 and level 3 could be introduced in a multimodal process:merging separately events of each user ensures the level 2, whilecombining events of all users ensures level 3. This is one of the keypoints of our system.

3 OVERVIEW OF THE MAIN FEATURES OF THE MCS ARCHI-TECTURE

Our work focuses on the design of a reconfigurable multimodal andcollaborative supervisor (MCS) for VE applications (see Fig. 1). Itperforms late fusion [7] to integrate at a semantic level, informa-tions coming from several users. The MCS provides several com-ponents required for its complete integration with a VR applica-tion/platform: an ”input interface” (interpreters), a processing core(interpretation fusion, argument fusion, command manager) and an”output interface” (command manager). The ”input interface” pro-cesses users inputs, the MCS core handles multimodal and collabo-rative treatments and the ”output interface” packages the final com-mands. Thanks to its splitting in four stages, the MCS covers thedisambiguation, decision and dialog phases (three critical pointsof collaborative and multimodal processes) and addresses equiva-lence, redundancy and complementarity [5].

We are conducting research on VR interaction with industrialpartners in order to evaluate the potential use of the proposed ap-proach. We applied our supervisor to a collaborative situation, en-titled MalCoMIICs, for multimodal and co-localized multi-user in-teractions for immersive collaborations [6]. This application usesthe EVE system1, a new multi-user and multi-sensorimotor CAVE-like set-up.

REFERENCES

[1] Y. Bellik. Media integration in multimodal interfaces. In Proc. of theIEEE Workshop on Multimedia Signal Processing, pages 31–36, 1997.

[2] R. A. Bolt. “put-that-there”: Voice and gesture at the graphics interface.

In SIGGRAPH ’80: Proc. of the 7th annual conference on Computergraphics and interactive techniques, pages 262–270, New York, NY,

USA, 1980. ACM.

[3] M. E. Latoschik. A user interface framework for multimodal vr in-

teractions. In ICMI ’05: Proc. of the 7th international conference onMultimodal interfaces, pages 76–83, New York, NY, USA, 2005. ACM.

[4] D. Margery, B. Arnaldi, and N. Plouzeau. A general framework for

cooperative manipulation in virtual environments. In Virtual Environ-ments, volume 99, pages 169–178, 1999.

[5] J.-C. Martin. Tycoon: Theoretical framework and software tools for

multimodal interfaces. In In John Lee (Ed.), Intelligence and Multi-modality in Multimedia Interfaces. AAAI Press, 1998.

[6] P. Martin, P. Bourdot, and D. Touraine. A reconfigurable architecture

for multimodal and collaborative interactions in virtual environments

(technote). In 3DUI ’11: Proc. of IEEE Symposium on 3D User Inter-faces (3DUI). IEEE Computer Society, in Press.

[7] S. Oviatt, P. Cohen, L. Wu, J. Vergo, L. Duncan, B. Suhm, J. Bers,

T. Holzman, T. Winograd, J. Landay, J. Larson, and D. Ferro. Designing

the user interface for multimodal speech and pen-based gesture appli-

cations: state-of-the-art systems and future research directions. Hum.-Comput. Interact., 15(4):263–322, 2000.

[8] H. Salzmann, J. Jacobs, and B. Frohlich. Collaborative interaction in

co-located two-user scenarios. In JVRC ’09: Proc. of Joint VirtualReality Conference - the 15th Eurographics Symposium on Virtual En-vironments, pages 85–92, 2009.

1http://www.limsi.fr/venise/EVEsystem

226