Download pdf - [IEEE 2011 IEEE Virtual Reality (VR) - Singapore, Singapore (2011.03.19-2011.03.23)] 2011 IEEE Virtual Reality Conference - Designing a reconfigurable multimodal and collaborative

Designing a reconfigurable Multimodal and Collaborative Supervisorfor Virtual Environment

Pierre Martin∗ Patrick Bourdot†

V&AR VENISE team, CNRS/LIMSI, B.P 133, 91403 Orsay (France)

ABSTRACT

Virtual Reality (VR) systems cannot be promoted for complexapplications (involving the interpretation of massive and intricatedatabases) without creating natural and ”transparent” user inter-faces: intuitive interfaces are required to bring non-expert usersto use VR technologies. Many studies have been carried out onmultimodal and collaborative systems in VR. Although these twoaspects are usually studied separately, they share interesting simi-larities. Our work focuses on the way to manage multimodal andcollaborative interactions in a same process. We present here thesimilarities between these two processes and the main features of areconfigurable multimodal and collaborative supervisor for VirtualEnvironments (VEs). The aim of such a system is to ensure themerge of pieces of information coming from VR devices (tracking,gestures, speech, haptics, etc.), to control immersive multi-user ap-plications using the main communication and sensorimotor chan-nels of humans. The framework’s architecture of this supervisorwants to be generic, modular and reconfigurable (via an XML con-figuration file), in order to be applied to many different contexts.

Index Terms: H.1.2 [Models and Principles]: User/MachinesSystems—Human information processing; H.5.1 [Information In-terfaces and Presentation]: Multimedia Information Systems—Artificial, augmented, and virtual realities; H.5.2 [Information In-terfaces and Presentation]: MUser Interfaces—User interface man-agement systems; I.3.6 [Computer Graphics]: Methodology andTechniques—Interaction techniques

1 INTRODUCTION

Efforts are still needed to study the use of Virtual Environments(VEs) for complex applications (product design, data exploration,etc.). But VEs cannot be promoted for such applications withoutcreating natural and ”transparent” user interfaces: intuitive inter-faces could bring non-expert users to use Virtual Reality (VR) tech-nologies, by exploiting human modalities. Moreover, these modal-ities have to be simultaneously perceived and interpreted, hence theneed for multimodal interfaces in VEs. Multimodality has been agrowing field of Computer Science since the eighties and impor-tant concepts have been addressed, especially by Bolt [2], Oviattet al. [7], Martin [5], Bellik [1] and Latoschik [3]. Additionally,managing groups of users in Collaborative VEs (CVEs), involvedin collaborative tasks is more and more required (e.g. Margery etal. [4] and Salzmann et al. [8]). Many studies focus on multi-modal or collaborative interactions in Virtual Environments, but toour knowledge none covers these topics in conjunction. This iswhat motivated the present work, further described in [6].

∗e-mail: [email protected]†e-mail: [email protected]

2 SIMILARITIES BETWEEN MULTIMODAL AND COLLABORA-TIVE INTERACTIONS

Several critical points of the multimodal process can be inferredfrom studies on multimodal interaction: disambiguation (compar-ing user’s inputs), decision (generating actions desired by a user)and dialog (decision management, identifying additional treatmentsto be applied on incoming actions). Previous multimodal ap-proaches have been carried out on single user and non-cooperativescenarios. But now, the improvements of VR technology allow, forinstance, cooperation of co-located users into a shared virtual scene.Therefore the multi-user context is a key point in the multimodalprocess. Such a system must be able to handle interactions fromseveral users and thus one must know when data fusion may occur,but also what data can be merged. What happens if pieces of infor-mation coming from several users have to be combined ? At whichlevel, the pieces of information have to be merged ? These new im-portant questions about multi-user multimodal systems reveal thatthere are similarities between collaborative and multimodal pro-cesses. We should not restrict the notion of collaboration onlyto simultaneous activities, but we need a broader view includingthe notions of dialog and continuity. It is not solely a question ofsolving some constraints at a given time, but rather of an ongoingmanagement over time. Thus, to bring closer these two processes,we have chosen to extend the classification for collaborative inter-actions proposed by [4]. Actually, the previous classification ofMargery et al. was mainly focusing on user’s collaborative manip-ulations. Our extended classification is task-based oriented, whichis more appropriate to a multimodal approach:

• Basic cooperation (level 1) : users can perceive each otherand can communicate. This level is composed of two generalsituations:

(a) co-located interaction : users are immersed in the sameplace or in the same display. Depending on the technologyused (full or limited cohabitation), they can have natural com-munication. For instance, in a multi-user CAVE with multi-stereoscopy visualization system, users have a full cohabita-tion. They can see or touch each other and can have naturaldialog. In case of HMD systems, users have a limited cohab-itation. They are able to have natural conversations, but theydo not see others physically.

(b) remote interactions : users are immersed in a virtual worldbut are distant (virtual presence). Therefore, multimedia com-munication and avatars are required.

• Parallel tasks (level 2) : the notion of task includes manipula-tion of scene’s objects (cf. Margery’s classification) but alsocommand aspect to control the application. Here, the userscan act on the scene individually: a given task is performedby only one user. This level is divided in two subdivisions, (a)constrained tasks (by scene design) and (b) free tasks. Users’interactions are completely independent of each other, what-ever the different processes.

• Cooperative tasks (level 3) : users can cooperate on the same

225

IEEE Virtual Reality 2011

19 - 23 March, Singapore

978-1-4577-0038-5/11/$26.00 ©2011 IEEE

Figure 1: The framework’s architecture. The Multimodal and Collaborative Supervisor (MCS) is configured and initialized using XML files. MCSprocesses users’ inputs, taking into account rules and context, and transmits results to the application.

object or within the same task. This level is divided in threesub-divisions:

(a) independent tasks : users generate tasks with a similar tar-get (but independent properties) which can be performed in-dependently of each other.

(b) synergistic task : users’ interactions can be combined, atany level of treatment, to generate one single task. This con-cept includes the previous co-manipulations and is close to thecomplementarity concept of Martin [5].

(c) co-dependent task : users generate dependent tasks (i.e.similar task or competitive task on a same target). These co-dependent tasks can not be performed immediately becauseambiguities have to be solved. With the ”similar tasks” comesa concept of redundancy, also close to the one of Martin [5].

It is obvious that level 1 of collaboration does not depend on thesystem used but rather on technology. But we have also noticed thatlevel 2 and level 3 could be introduced in a multimodal process:merging separately events of each user ensures the level 2, whilecombining events of all users ensures level 3. This is one of the keypoints of our system.

3 OVERVIEW OF THE MAIN FEATURES OF THE MCS ARCHI-TECTURE

Our work focuses on the design of a reconfigurable multimodal andcollaborative supervisor (MCS) for VE applications (see Fig. 1). Itperforms late fusion [7] to integrate at a semantic level, informa-tions coming from several users. The MCS provides several com-ponents required for its complete integration with a VR applica-tion/platform: an ”input interface” (interpreters), a processing core(interpretation fusion, argument fusion, command manager) and an”output interface” (command manager). The ”input interface” pro-cesses users inputs, the MCS core handles multimodal and collabo-rative treatments and the ”output interface” packages the final com-mands. Thanks to its splitting in four stages, the MCS covers thedisambiguation, decision and dialog phases (three critical pointsof collaborative and multimodal processes) and addresses equiva-lence, redundancy and complementarity [5].

We are conducting research on VR interaction with industrialpartners in order to evaluate the potential use of the proposed ap-proach. We applied our supervisor to a collaborative situation, en-titled MalCoMIICs, for multimodal and co-localized multi-user in-teractions for immersive collaborations [6]. This application usesthe EVE system1, a new multi-user and multi-sensorimotor CAVE-like set-up.

REFERENCES

[1] Y. Bellik. Media integration in multimodal interfaces. In Proc. of theIEEE Workshop on Multimedia Signal Processing, pages 31–36, 1997.

[2] R. A. Bolt. “put-that-there”: Voice and gesture at the graphics interface.

In SIGGRAPH ’80: Proc. of the 7th annual conference on Computergraphics and interactive techniques, pages 262–270, New York, NY,

USA, 1980. ACM.

[3] M. E. Latoschik. A user interface framework for multimodal vr in-

teractions. In ICMI ’05: Proc. of the 7th international conference onMultimodal interfaces, pages 76–83, New York, NY, USA, 2005. ACM.

[4] D. Margery, B. Arnaldi, and N. Plouzeau. A general framework for

cooperative manipulation in virtual environments. In Virtual Environ-ments, volume 99, pages 169–178, 1999.

[5] J.-C. Martin. Tycoon: Theoretical framework and software tools for

multimodal interfaces. In In John Lee (Ed.), Intelligence and Multi-modality in Multimedia Interfaces. AAAI Press, 1998.

[6] P. Martin, P. Bourdot, and D. Touraine. A reconfigurable architecture

for multimodal and collaborative interactions in virtual environments

(technote). In 3DUI ’11: Proc. of IEEE Symposium on 3D User Inter-faces (3DUI). IEEE Computer Society, in Press.

[7] S. Oviatt, P. Cohen, L. Wu, J. Vergo, L. Duncan, B. Suhm, J. Bers,

T. Holzman, T. Winograd, J. Landay, J. Larson, and D. Ferro. Designing

the user interface for multimodal speech and pen-based gesture appli-

cations: state-of-the-art systems and future research directions. Hum.-Comput. Interact., 15(4):263–322, 2000.

[8] H. Salzmann, J. Jacobs, and B. Frohlich. Collaborative interaction in

co-located two-user scenarios. In JVRC ’09: Proc. of Joint VirtualReality Conference - the 15th Eurographics Symposium on Virtual En-vironments, pages 85–92, 2009.

1http://www.limsi.fr/venise/EVEsystem

226