12
ORIGINAL ARTICLE Mixing real and virtual conferencing: lessons learned Geetika Sharma Ralph Schroeder Received: 20 August 2012 / Accepted: 1 August 2013 / Published online: 14 August 2013 Ó Springer-Verlag London 2013 Abstract This paper describes a conference which linked several remote location sites via a virtual environment so that the virtual audience could follow the presentations and interact with real presenters. The aim was to assess the feasibility of linking distributed virtual audiences to an ongoing conference event. The conference consisted of an annual gathering of researchers and developers of a global information technology consultancy firm based in India. This firm developed a virtual environment specifically for distributed collaboration across sites. During the confer- ence, researchers gathered various types of data, including participant observations, interviews, capture of the virtual environment and a survey of the audience. These data are analysed in the paper. The main finding is that a number of ‘low tech’ improvements could be made to the operation of the system that could greatly enhance this type of virtual conferencing. A related finding is that the visual fidelity of the environment and of the avatars plays a lesser role than other factors such as audio quality. Given the paucity of research on how virtual conferencing can substitute for travel, plus the urgency of this topic for environmental reasons, a number of suggestions are made for the imple- mentation of remote virtual conference participation. Keywords Collaborative virtual environments Á Real and virtual conferences 1 Introduction In this paper, we describe a technical conference which involved the participation of both a real audience and a virtual one from a dozen different locations. We present various types of data, including from participant observa- tion, informal interviews and a survey of the remote virtual audience participants. To date, there is relatively little work on conferencing using virtual environments, and existing research is mainly in related areas such using virtual environments in education, videoconferencing to maintain personal relationships, and research on computer-supported co-operative work for small group tasks. After reviewing this work and how it bears on the current study, we then describe the system features and the conference event that is analysed here. We then go on to describe our findings, starting with participant observations and interviews and then turning to results from the survey. Combining these data allows us to derive a number of lessons for future research and for improving the experience of future conferences. 2 Previous research and motivation A conference shares some of the features of the use of collaborative virtual environments (CVEs), like those used in education. There are a number of studies of CVEs (Churchill et al. 2001), though these have not specifically focused on remote conference participation. Conferences also share some features of distributed work meetings, which have been far less studied (Sharma et al. 2011). Finally, there are similarities with virtual exhibitions (Penumarthy and Boerner 2006). Conferences using CVEs are nevertheless different from these three uses of CVEs in G. Sharma (&) Tata Consultancy Services, 249 D & E Udyog Vihar, Phase IV, Gurgaon, Haryana, India e-mail: [email protected]; [email protected] R. Schroeder Oxford Internet Institute, University of Oxford, Oxford, UK e-mail: [email protected] 123 Virtual Reality (2013) 17:193–204 DOI 10.1007/s10055-013-0225-x

Mixing real and virtual conferencing: lessons learned

  • Upload
    ralph

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Mixing real and virtual conferencing: lessons learned

ORIGINAL ARTICLE

Mixing real and virtual conferencing: lessons learned

Geetika Sharma • Ralph Schroeder

Received: 20 August 2012 / Accepted: 1 August 2013 / Published online: 14 August 2013

� Springer-Verlag London 2013

Abstract This paper describes a conference which linked

several remote location sites via a virtual environment so

that the virtual audience could follow the presentations and

interact with real presenters. The aim was to assess the

feasibility of linking distributed virtual audiences to an

ongoing conference event. The conference consisted of an

annual gathering of researchers and developers of a global

information technology consultancy firm based in India.

This firm developed a virtual environment specifically for

distributed collaboration across sites. During the confer-

ence, researchers gathered various types of data, including

participant observations, interviews, capture of the virtual

environment and a survey of the audience. These data are

analysed in the paper. The main finding is that a number of

‘low tech’ improvements could be made to the operation of

the system that could greatly enhance this type of virtual

conferencing. A related finding is that the visual fidelity of

the environment and of the avatars plays a lesser role than

other factors such as audio quality. Given the paucity of

research on how virtual conferencing can substitute for

travel, plus the urgency of this topic for environmental

reasons, a number of suggestions are made for the imple-

mentation of remote virtual conference participation.

Keywords Collaborative virtual environments �Real and virtual conferences

1 Introduction

In this paper, we describe a technical conference which

involved the participation of both a real audience and a

virtual one from a dozen different locations. We present

various types of data, including from participant observa-

tion, informal interviews and a survey of the remote virtual

audience participants. To date, there is relatively little work

on conferencing using virtual environments, and existing

research is mainly in related areas such using virtual

environments in education, videoconferencing to maintain

personal relationships, and research on computer-supported

co-operative work for small group tasks. After reviewing

this work and how it bears on the current study, we then

describe the system features and the conference event that

is analysed here. We then go on to describe our findings,

starting with participant observations and interviews and

then turning to results from the survey. Combining these

data allows us to derive a number of lessons for future

research and for improving the experience of future

conferences.

2 Previous research and motivation

A conference shares some of the features of the use of

collaborative virtual environments (CVEs), like those used

in education. There are a number of studies of CVEs

(Churchill et al. 2001), though these have not specifically

focused on remote conference participation. Conferences

also share some features of distributed work meetings,

which have been far less studied (Sharma et al. 2011).

Finally, there are similarities with virtual exhibitions

(Penumarthy and Boerner 2006). Conferences using CVEs

are nevertheless different from these three uses of CVEs in

G. Sharma (&)

Tata Consultancy Services, 249 D & E Udyog Vihar,

Phase IV, Gurgaon, Haryana, India

e-mail: [email protected]; [email protected]

R. Schroeder

Oxford Internet Institute, University of Oxford, Oxford, UK

e-mail: [email protected]

123

Virtual Reality (2013) 17:193–204

DOI 10.1007/s10055-013-0225-x

Page 2: Mixing real and virtual conferencing: lessons learned

that they involve speakers and audiences and bring people

together for the exchange of ideas. Yet, virtual conferences

have rarely been investigated or evaluated (Damer 2000).

What is known about virtual conferencing is anecdotal

or tacit knowledge, not captured in evaluations or publi-

cations. The main exception is a study by the IEEE VR

conference programme committee (Lindeman et al. 2009),

which carried out its 2-day meeting to review papers for its

annual conference in Second Life. This involved partici-

pants from around the world, and a high proportion of its

55 members and reviewers attended the meeting (42

responded to a survey, and these respondents attended for

5.5 h on average). The main findings of that study, as for

the study presented here, are that organizational factors are

more important than technical ones. The study authors have

six recommendations, which are that (1). Pre-meeting

experience is important, (2). Anonymity in text channels

needs to be ensured, (3). Two meeting chairs should be

present at all times, (4). A clear means of showing the

paper being discussed is important, (5). ‘Canned’ avatar

expressions would be useful, (6). Ensure good scheduling

of those who need to be present for discussing a paper. On

the whole, the conference meeting was successful: it

enabled more to attend than otherwise, and it saved time

and money, though it was regarded as not as good as a

face-to-face meeting. As we shall see, these findings and

recommendations have echoes in the study described here,

although the settings (virtual versus mixed virtual and

video, and review meeting versus conference presenta-

tions) are quite different, and so the lessons learned here

are also different.

In a recent experiment (Shirmohammadi et al. 2012)

which combined physical and virtual participation in a

small-scale conference (maximum of eight participants),

there were a number of similar findings to the current

study: for example, it was mentioned that the scale should

be kept small for easier management of the networked

virtual environment. In this study, there was also pre-

conference orientation for participants to ensure that there

were no technical difficulties and that users were com-

fortable with the set-up. The study also recommends that

this kind of training is essential, though in the current study

of a business setting this is unlikely to be practical. There

were a number of other features of this experiment which

are different from the current one, for example that the

virtual audience consisting of several remote participants

was represented by a single avatar navigated by the oper-

ator of the virtual environment (who would also raise an

arm on instructions from the virtual participants to indicate

a question: this is interesting in the light of our findings, but

also unrealistic). Further, there was no bird’s eye of the

environment. The study (Shirmohammadi et al. 2012)

recommends that there should be such a bird’s eye view

and that participants should have their own avatar, which

were features of the current study that are described below.

The virtual/real conference that we describe here is

novel in at least five ways: one is the mix of the virtual and

real audience. The second is that the conference was car-

ried out in a CVE which was restricted to researchers from

one organization, rather than in an open-to-all environment

like Second Life. Third, the conference was sustained over

the course of two and a half days. Fourth, the conference

brought together 12 locations where participants were co-

located but from where they could take part in a single

virtual lecture theatre with avatars. Fifth, the virtual world

was developed specifically for work inside the company.

Some of the features of the environment will be described

later.

The conference also included a live video feed of the

conference presentations inside the virtual lecture theatre.

Previous work on videoconferencing has been about

meetings (Hirsh et al. 2005), scientific collaboration

(Sonnenwald 2006), collaboration in small groups (Vander

Kleij et al. 2005), home uses of video systems (Kirk et al.

2010), and comparisons of video and avatar representations

and interaction (Schroeder 2011). Work on conferences in

virtual environments (VEs) has so far been about com-

pletely in-world events (Damer 2000). The conference

described here is a hybrid of the two.

Much of the previous related work has been concerned

with either distributed work (Hinds and Kiesler 2002),

shared workspaces (Harrison 2009), and task performance

in small groups in virtual conferencing (Anderson et al.

2001). Video conferencing has also been studied in terms

of the performance of small groups (three participants)

compared with doing the same task face to face (Vander

Kleij et al. 2005). Similar studies of two and three partic-

ipants have been carried out for immersive and desktop

virtual environments, finding an asymmetry between par-

ticipants in the immersive system (the leader in the task) as

against the desktop participant, without any such asym-

metry in doing the task face to face (Slater et al. 2000).

Another study of doing a Rubik’s cube-type puzzle in a

distributed immersive virtual environment by two partici-

pants found that it is just as good, in terms of time taken, as

doing the task face to face (though performance with one

immersive system and one desktop, or two desktop sys-

tems, is much poorer) (Schroeder et al. 2001). This goes

against Olson and Olson’s (2000) well-known finding that

‘distance matters’ in collaboration.

Other studies have focused on shared workspace col-

laboration. For example, Gutwin and Greenberg (2001)

provide a fine-grained understanding of workspace

awareness and develop a model for how people collaborate

in small groups around shared objects and tasks [(see also

Rittenbruch and McEwan 2007) for an overview of this

194 Virtual Reality (2013) 17:193–204

123

Page 3: Mixing real and virtual conferencing: lessons learned

work and a typology of analyses, and the collection of

studies in Churchill et al. (2001), Vertegaal (1998)]. This

type of work can be used in combination with usability

studies of collaborative virtual environments (Schroeder

et al. 2006), where gaze, reference to objects, and turn-

taking can be analysed in detail. It is possible to analyse

collaboration in groups in multi-user virtual environments

both quantitatively and qualitatively, categorizing different

types of acts and sequences of acts in terms of who is

interacting with whom, where the focus of the interaction

is, and whether there are obstacles to turn-taking and fluid

interaction (Schroeder 2010: 217–227). For virtual envi-

ronments and distributed collaboration in small and large

groups, object-focused or focused on spatial tasks and

verbal tasks, this research has been reviewed by Schroeder

(2010: 95–140). There are few systematic comparisons

between video- and virtual-mediated communication and

collaboration (but see Schroeder 2010: 249–74), but mixed

systems elude systematic comparison because they are

likely to provide quite different admixtures of virtual,

video and real elements.

Labhart et al. (2012), in an evaluation of using a com-

bination of video, collaborative virtual environments and

Web-based resources, also gained a number of insights that

are similar to those that will be presented here: for exam-

ple, they note that simple solutions such as a ‘clicker

system’ whereby students might, with a simple click, ask

the lecturer predefined questions (‘more explanation

please’) could enhance the interactivity between a large

group of learners and the lecturer: this is similar to a

solution that will be put forward here about how confer-

ence attendees could gain the attention of the presenter.

As for teaching and learning in virtual worlds, Tay

(2012) evaluated a number of groups in Second Life in

depth, through the findings about in-world dynamics

among teacher and learners and in groups over the course a

number of sessions over two years related to philosophy,

science and education. This detailed and long-term study

found that learning needs to be sustained by a complex

combination of group cohesion, task orientation and

learning as a group.

Work on videoconferencing has a long history (see the

early introduction by Finn (1997), and early studies in Finn

et al. (1997), but has recently been focused on long-dis-

tance personal relationships (Rintel 2013, forthcoming),

which raises different issues, such as intimacy, that do not

apply to a professional conference setting, as here.

In our study, users had a choice of two avatars (standard,

or customized with a photograph of their own face). Avatar

appearance has been extensively discussed in the literature

on virtual environments [Bailenson and Beall 2006; see

also Bente et al. (2008), and Garau (2006)] though this did

not play a significant role in this study. The idea of full-

body video meetings has recently been explored for a four-

person meeting (Beck et al. 2013), and it will be interesting

in future studies to see if the addition of full bodies can be

applied to conferences with large numbers of attendees.

As will become evident below, while this research has

developed and refined concepts that are applicable to this

study (presence and co-presence, awareness, measures of

task performance), the mixed reality setting and conference

setting in our study—which was not so much educational,

as it was a mix between an academic and a professional

conference—also means that the study surfaced different

challenges. Damer (2000) reported some descriptive find-

ings of conferences inside virtual worlds. This is the only

other study to date, apart from those discussed here which

were in different settings, of virtual conferencing as such

that we are aware of.

3 Description of the system and event

The virtual environment (VE) that was used was Virtual-

Office (VO), a system designed to enable workplace col-

laboration (Sharma et al. 2011). The conference event, and

how it was presented to conference attendees, can be seen

in Fig. 1. The view of the conference as seen by the remote

participants can be seen in Fig. 2. Apart from the VE of the

office or in this case virtual lecture theatre, the system

includes a separate window that displays the user’s aura in

a grid which contains the people within his/her presence,

awareness and text chat range. A text chat window below

the grid shows the history of the user’s past conversations

and there is a chat bar for inputting text chat (see Fig. 3,

where the top left window contains the aura grid).

Initiation of a presentation inside the VE causes the 3D

view to split further to show a video stream of the presenter

and the presentation document (right window in Fig. 3).

Live video streaming is implemented using the open-

source VLC1 library. The presenter’s video stream is

encoded using h.264 format with a variable bit rate and

320 9 240 pixels frame size. It is transmitted using UDP

protocol to a VLC streaming server hosted on a large

instance on the Amazon EC2 cloud. VirtualOffice clients

connect to this server to receive the video feed over http.

The presentation slides update automatically on an ‘in-

world’ screen as the speaker advances through the pre-

sentation. A user may browse through the presentation

independent of the speaker in the document window or

return to the speaker’s page. The VO system also includes

voice communication implemented using Mumble,2 an

1 VideoLan http://www.videolan.org.2 Mumble, an open-source voice chat software. URL: http://www.

mumble.com.

Virtual Reality (2013) 17:193–204 195

123

Page 4: Mixing real and virtual conferencing: lessons learned

Fig. 1 View of real auditorium

with virtual auditorium

projected

Fig. 2 View seen by remote audience

196 Virtual Reality (2013) 17:193–204

123

Page 5: Mixing real and virtual conferencing: lessons learned

open-source VoIP chat system. Mumble uses the Speex

codec for audio data encoding and UDP protocol for

transmission set at 40kbits/sec. VO clients connect to a

Mumble server hosted on another large instance on the

Amazon EC2 cloud.

All in-world updates such as avatars positions, orienta-

tions, animation states, text chat are exchanged between

clients through a virtual world server built using the

Google App Engine and hosted on the appspot domain.

More details can be found in Sharma et al. (2011).

The system was tested extensively beforehand. It has

been shown that videoconferencing needs a lot of testing,

especially when there are a number of remote locations

involved (Sonnenwald 2006). It can be anticipated that the

main failure during our event was the failure of voice when

avatars asked questions, and the poor quality of voice

communication. In this respect, our event was like many

others. It can also be mentioned already that the effect of

poor voice communication and audio quality means that

there is poor turn-taking between speakers and avatar

questions. These basic failures have implications for

design. Finally, it can be noted that this virtual confer-

encing event was very labour intensive, not just in pre-

testing, but also in having a number of operators present

throughout the event. We will make some suggestions in

the concluding sections about how these problems can be

avoided.

The study was carried out in the course of a two-and-a-

half-day conference. The format was that of a typical

research conference: keynotes, presentations, two tracks for

parallel presentations during part of the conference (only

one of these tracks implemented the virtual audience par-

ticipation). The conference was attended by approximately

80 researchers at the real conference location and approx-

imately 55 researchers in 12 remote locations, though we

obtained data only from 37 participants from 8 locations in

India. On average, there were 5–6 attendees at each loca-

tion for a presentation. All these remote participants,

though at different physical sites, entered into the same

virtual conference room (that is, the different physical sites

were not represented as 12 different rooms, but as one

room containing all avatar conference participants). The

Fig. 3 VirtualOffice window layout

Virtual Reality (2013) 17:193–204 197

123

Page 6: Mixing real and virtual conferencing: lessons learned

real conference had seats that were arranged in a theatre

style (with steeply rising rows of seats, see Fig. 4), while in

the rooms for virtual participants all were seated on the

same level, Fig. 2). To manage the VE, there were two

operators at the physical location (one primarily to handle

the camera view, the other to handle audio and questions

from virtual audience). The virtual audience was displayed

on a 5 9 7 feet screen in the real auditorium. An avatar

located at the centre of the screen, standing and asking a

question was displayed at an approximate size of 8 inches.

The sizes of other avatars and objects in the VE were

progressively reduced by perspective foreshortening.

At each remote location, a conference room was set up

to broadcast the proceedings of the physical conference

with one operator. The VO window was projected in the

conference room showing a live video feed from the

physical conference, the presentation document of the

speaker and a graphical view of the virtual auditorium on a

single projection screen (see Fig. 3) of size 4 9 6 feet. The

virtual auditorium was designed to look like a typical

conference venue. Multiple cameras were pre-configured in

VO to show views of the VE from different vantage points.

The speaker’s audio was streamed into VO and played on

the sound system installed in the conference room, and

there was a microphone to talk to the physical auditorium

and other remote conference rooms. The VO window is

designed to display the speaker’s video in a resizable

window of default size a fifth of the width by a fifth of the

height of the total display size.

To attend the virtual conference, an employee of the

organisation could register at the conference website and

submit a photograph of him/her. A look-alike avatar was

created for each registered attendee, which was added to

the virtual lecture theatre when he/she was physically at a

conference room. Look-alike avatars were also created for

operators and speakers who submitted their photographs.

Employees could also walk-in to attend the virtual con-

ference without registration and generic male/female

avatars were added for them. Avatars were animated with

appropriate actions such as sitting, standing, raising hand

and speech animation depending on their real-world

activity. When avatars wanted to join from any of the

remote locations, their avatar would enter the virtual

auditorium and go to a free chair. When an attendee left the

conference his/her avatar was also deleted from the virtual

auditorium.

With this set-up, remote participants (a) did not have to

arrange the requisite hardware to set up and install the VO

software themselves, (b) were physically collocated with

others at their location and (c) could come and go

depending on their interest and availability. On the other

hand, they had less control over their own avatars. Also,

although the virtual auditorium was designed to accom-

modate 70 avatars in order to ensure real-time interactivity

and rendering, the number of remote attendees was only

limited by the size and number of remote conference rooms

that could be set up.

3.1 Method

Our methods included participant observation, a survey of

remote participants and informal interviews with the real

and virtual audiences. As regards participant observation,

one of the authors of this paper was a speaker and in the

audience at the real conference and one managed the

conference remotely.

3.2 Results

We begin with observations based on informal interviews

with the real audience and on participant observation at the

real conference (see Fig. 4). We will cover a number of

areas: how the virtual lecture theatre was displayed, how

the speaker was represented inside the virtual world, how

remote participant avatars could be identified, how avatars

could navigate and their appearance and gestures and

finally how the virtual audience asked questions and

interacted with the real conference.

3.2.1 Camera view of the virtual lecture theatre

During the conference, the virtual lecture theatre was often

statically displayed for the real and virtual audience, apart

from when avatars entered and left the room and raised

their hands for questions. When it was not static, the

operator co-located with the real conference continued to

switch the camera view: looking from the back of the

lecture theatre towards the front with the presentation

display, looking from the front of the theatre towards the

audience and generally moving around between the ava-

tars. At the remote locations, the operator would keep the

Fig. 4 Participants at the conference

198 Virtual Reality (2013) 17:193–204

123

Page 7: Mixing real and virtual conferencing: lessons learned

view static when a lecture was going on. The system would

automatically focus on the avatar asking a question

whenever a remote attendee would do so. During breaks,

the operator would toggle through different camera views

or focus on one based on requests from the audience.

Although it is not an issue for the speaker in the real

lecture theatre, as he or she is generally facing the real

audience in the lecture theatre except during the Q&A,

from the point of view the real audience, the balance

between a static virtual lecture theatre and switching the

view was important: a perfectly static camera view of the

virtual theatre would have seemed rather boring and would

have made the real audience feel as though the virtual one

was not there (avatars entering and leaving the virtual

audience were rather infrequent). Moving the camera

around too much, on the other hand, can be distracting for

the real audience. This balance could be investigated in

future work, where both the real audience and remote

participants could be given a choice in a survey between

whether they prefer more versus less camera movement.

The point here is that having only one or other of these

options exclusively would not have provided a rich expe-

rience for the real and the virtual audience.

3.2.2 Virtual speaker

There were look-alike or generic avatars of the speaker in

the virtual lecture theatre as for the virtual audience.

Although this may add to the conference experience of the

virtual audience, it was neither a distraction nor did it add

to the experience of the real audience to see both the real

speaker and his or her avatar representation. One reason for

highlighting this is that in a mixed real/virtual setting,

having both real and avatar speakers simultaneously is

somewhat artificial since people cannot be in two places at

the same time. On the other hand, there were a few occa-

sions when the video stream became unavailable due to

low bandwidth at either the streaming out or the receiving

location. In such cases, having the speaker’s avatar in view

helped preserve a sense of continuity of the conference.

3.2.3 Avatar identification

The names of operators and their locations (i.e. Ahmeda-

bad, Delhi) were visible in the aura grid and would appear

above their avatars’ heads in the VO environment when the

mouse hovered on them. This was a useful feature as it

gave the real audience a sense of the remote participation.

On the other hand, it was not clear to the real audience

whether look-alike avatars or generic avatars, whose names

did not appear in the grid, were participating via the remote

location or simply added as ‘filler’. Making this clear to the

real audience (for example, by only allowing ‘named’

avatars to participate) might add to the realism of the vir-

tual audience.

3.2.4 Navigation, avatar features and gestures

Avatars could only walk to their seats, sit, raise their hands,

stand and show speech gestures when speaking. Unlike in

systems like Second Life, there was no flying around or

running, which was suitable for the workmanlike and

serious atmosphere of a research conference. Apart from

this, automatic gestures and movements (shifting one’s

body) were generated for the avatars, which added con-

siderably to their ‘presence’.

Some employees had their look-alike avatars custom-

ized by means of a photograph of their head rendered in

3D. Out of the 70 registrants for the virtual conference, 33

submitted photographs, of which 26 met the requirements

for creating a look-alike avatar. This was highly effective

both for enhancing the ‘presence’ of the virtual audience

and allowing the real audience and the speakers to recog-

nize colleagues. One drawback, on the other hand, was the

absence of facial expressions.

There were also, apart from the avatars with photo-

realistic heads, some ‘generic’ avatars which looked

identical (though to increase the variety somewhat, there

were outfits with two different colour schemes for both

male and female avatars). This limited variety—again,

contrasting with Second Life and the ability for each per-

son to customize their avatar—seemed appropriate for the

context of a research conference and perhaps preferable to

some of the outlandish avatars in Second Life.

3.2.5 Recognition of questions from the virtual audience

and turn-taking

Speakers prioritized recognizing questions from the real

audience by a ratio of perhaps ten real to one avatar

question. Since avatars had their hands raised for many

more questions, perhaps twice the number that were rec-

ognized, this means that many questions from the virtual

audience were not answered. Now it is true that there were

far fewer questions from the virtual audience. Yet, it was

also very difficult for the speaker to simultaneously scan

the virtual audience in addition to the scanning the real

audience to identify the raised hands of avatars.

It can be added that partly because of poor audio quality,

and also partly because a question and answer session

requires non-verbal communication (nodding, body posture

to indicate disagreement, etc.), this disadvantage—plus the

disadvantage of not having questions recognized—is a key

challenge to the conference scenario presented here. In this

respect, it can be noted that although a completely in-world

Virtual Reality (2013) 17:193–204 199

123

Page 8: Mixing real and virtual conferencing: lessons learned

conference shares this disadvantage, at least all participants

are disadvantaged equally.

Turn-taking also relates to following up on questions with

a second comment or question, and this proved almost

impossible for the virtual audience—whereas it is easy for

the real audience. Finally, turn-taking with the virtual ava-

tars asking questions was also less than smooth because of

the out-of-sync stops and starts and overlapping speech for

both the real speaker and the virtual questioner. This prob-

lem is well known in several forms of mediated communi-

cation (including synchronous chat in VEs, though in chat

this does not necessarily reduce effectiveness—as all the text

is available on the screen for review). In addition, the virtual

audience depended for hearing questions from the real

audience on a microphone being passed to the questioner in

the real audience, which did not happen consistently in this

case and was thus frustrating and disadvantaging.

4 Survey results

At this point, the results of a survey can be discussed, which

was completed by remote conference participants at eight

remote locations—Delhi, Kolkata, Bangalore, Ahmedabad,

Mumbai, Pune, Hyderabad and Chennai—with 37 respon-

ses to 15 multiple choice and open-ended questions. Most of

the remote participants (29) filled out the survey in the first

session of the virtual conference they attended; 8 filled out

the survey on returning to the virtual conference for another

session. When asked if they had used video conferencing

tools such as WebEx3 before, 30 said they had and 4 had not

(3 did not answer). We also asked whether they preferred

WebEx or the virtual world and why, and there were 12

roughly evenly divided responses (some pointed to advan-

tages and disadvantages of both): those that preferred the

virtual world mentioned the greater multimodality, inde-

pendent control over presentation documents, interactivity

and intuitiveness. The drawbacks of the virtual world were

poor audio and video. Among those who preferred WebEx-

type videoconferencing, the reasons included greater reli-

ability, and better video and audio quality.

4.1 Problems with technology and suggestions

for improvement

When asked whether they would have preferred a different

layout of the projected screen (see Fig. 3), 16 answered no.

Of the 20 who said yes, 7 would have preferred more video

(i.e. a larger proportion of the screen allocated to video), 6

more of the virtual world, and 7 more of the presentation

document (no answer 1). We also asked an open-ended

question about how the virtual conference could be

improved, and the vast majority of the 30 responses where

about how audio and video quality could be improved.

Several responses pointed to other faults to do with the

presentations (for example, speakers should have laser

pointers). There were also several suggestions about

remote participants having more control over their point of

view in engaging with the real conference audience (i.e.

controlling the camera in the conference room). Only one

respondent made suggestions about better control of avatar

movements, gestures and expressions (also of the avatar

speaker).

4.2 Interacting with the real conference and the virtual

audience

To elicit whether remote participants had a sense of being

there at the conference, we gave respondents four choices

on a scale: very strong (none), strong (5), weak (19) and

very weak (6) (no answer 7). Then, we asked whether, also

considering time and cost, they would have preferred to

attend the conference in person, 10 said they did not and 21

would have preferred to be there in person (no answer 6).

Only 2 respondents said that they wanted to ask a question

of the conference presenter and did not have a chance to do

so, whereas for 32 this had not been a problem (3 did not

answer).

Asked if they enjoyed more being in the room together

with remote or with physically present people, 13 answered

‘with remote people’ and 14 ‘with physically present

people’ (no answer 10). When asked if they recognized any

of the other remote avatar participants, 14 respondents said

that they did and 19 did not (4 did not answer). For those

who did recognize one of the avatars, we asked whether

this recognition made them want to establish another kind

of contact with the avatar, 10 answered positively (1 said

they would do this via phone, 8 said via email and one said

‘other’).

We also asked whether they would have preferred to

control their own avatar (i.e. not to have it controlled by an

operator), and 21 said yes and 10 said no. In a separate

question, we asked if they thought that their avatar was a

good way to represent them at the conference, and 17

replied positively. Of the 9 who disagreed (11 did not

answer), 2 said there should be greater fidelity and 2 said

that they did not get a chance to experience their avatar.

4.3 Experience of the conference

Asked if they found the experience of remote conference

participation useful for learning about the conference

content, 5 answered ‘very useful’, 14 ‘quite useful’, 9 ‘a

little bit useful’ and 7 ‘not useful’ (4 did not answer). There3 WebEx, Web conferencing. URL: http://www.webex.com.

200 Virtual Reality (2013) 17:193–204

123

Page 9: Mixing real and virtual conferencing: lessons learned

were 22 replies to an open-ended question: ‘what did you

like most about attending the conference virtually?’ These

replies mentioned a wide range of reasons: from the

practicalities of avoiding travel and cost and being able to

devote one’s attention as one wished, to being there while

not being there physically, being able to roam around the

virtual conference room, and having a number of modali-

ties on the screen.

We also asked ‘what did you like least?’, and of the 21

replies, over half were about poor audio quality, six com-

plained about the video quality (four of these mentioned

both audio and video) and only one mentioned that the

avatar experience could be improved (the remainder were

other quality complaints). In a separate forced choice

question, 18 thought that the audio quality was poor and 17

thought it was good.

Before we discuss these results further, it can be men-

tioned that a striking feature of the survey responses is how

mixed and nuanced the answers were. This could indicate a

validation of the usefulness of the questions, but more

importantly suggests that remote participants recognized

the benefits as well as the drawbacks of virtual conference

participation.

5 Discussion, implications for design and future work

Many points made in the previous sections address mun-

dane issues about mixed reality conferencing. However, in

our experience, much previous research on distributed

collaboration in MUVEs (in education, meetings, exhibi-

tions as well as conferences) has consisted of documenting

trials and demonstrator systems without evaluating them or

suggesting how they could be improved (Sonnenwald

2006).

Having learned a number of lessons from this confer-

ence, we now describe a number of ways in which the

problems that occurred during the event could be over-

come, and how future events of this type could be socio-

technically managed to enhance the experience—both in

terms of the effectiveness of the task and in terms of the

enjoyment of the social interaction.

5.1 Ways to enhance future virtual conferences

One feature which could easily be implemented in future

conference events is allowing people to move between

different parallel track presentations. As long as all of them

are captured on video and the presentations are made

available in different virtual lecture theatres, avatars could

freely move between them. It is a well-known problem in

physical conferences that moving around between sessions

entails disruptions, but avatars moving between sessions

would create only minimal disruption. Further, with

responses to speakers at conferences nowadays increas-

ingly being tweeted, moving around in the light of how

highly the audience rates a speaker in real time could be

highly effective—though it might also be discourteous.

A useful feature at mixed real/virtual conferences could

be to allow real audience members (apart from the speaker)

to go up to the screen and speak to avatars whom they

recognize during breaks. This informal getting together to

discuss the presentation, meet new people or chat with

those whom one has not seen for some time is a well-

known benefit of physical conferences, and easily imple-

mented with a virtual audience (the number of people who

can ‘mill around’ and find and talk to each other may be

limited with virtual audiences, but locating and queuing to

speak to each other also has possibilities in a virtual

environment that are more restricted in a physical space).

5.2 Benefits of real/virtual conferencing

One advantage of virtual conferencing is of course that

virtual participants could, in principle, be anywhere (such

as lying on the beach) as long as they had a laptop. On the

other hand, it is clear that our co-located remote groups had

useful discussions and enjoyed social interaction in their

respective physical locations, if we recall the almost even

split between those remote participants who enjoyed being

with other remote people (13) as against enjoying being

with physically co-present participants. The trade-off

between benefits and drawbacks of these two options

deserve further investigation.

A clear benefit is that the virtual audience can take

breaks, eat food and the like—all the while still keeping

(most) of their attention (depending on the other activity)

on the presentation. This is not a trivial advantage during

conferences that are often long and have few breaks.

One advantage of a virtual audience is that it is scalable.

Again, there is a trade-off: in our case, the virtual lecture

theatre was limited to 70 or so chairs, enough to accom-

modate those who had registered to participate remotely. In

principle, however, thousands or more could participate:

would a virtual lecture theatre with an audience of 1,000s

detract from their presence vis-a-vis the real audience?

Note, however, that a practical limitation to full partici-

pation (asking a question) is in any case limited to a

manageable number, at least during a ‘live’ conference.

A related point here is that one trade-off between

‘realism’ and the ‘artificial’ nature of this conference link-

up was that remote participants were in the same virtual

lecture theatre—even though they were physically in dif-

ferent remote locations. Realistically, they should have

been shown in different rooms. However, with names of

their locations above their heads, this is a non-realistic

Virtual Reality (2013) 17:193–204 201

123

Page 10: Mixing real and virtual conferencing: lessons learned

feature that did not seem to detract from a ‘natural’ remote

participation experience (it was not mentioned in any of

our informal interviews or survey responses).

5.3 Communication and interaction problems

and potential solutions

Voice communication problems and audio quality have

already been mentioned. Apart from fixing this technically,

there are a number of socio-technical management solu-

tions. A critical problem for participation by the virtual

audience was to have their questions recognized by the

speaker. This was because the cognitive load on the

speaker to pay attention to and scan both the real and the

virtual audience for questions simultaneously was too

large. Further, it is easier to scan the real audience than the

virtual one since it is difficult to distinguish avatars with

their hands raised in a virtual lecture theatre displayed on a

2D screen. However, this problem can easily be overcome:

for example, there could be a small flashing red light, either

physically in the real lecture theatre (perhaps on the

speaker podium) or in the virtual lecture theatre (perhaps in

the lower corner closest to the speaker). This would make it

very easy to recognize the questions put by avatars, perhaps

even giving avatars an advantage over questions from the

real audience (it is easy to ignore a raised hand in a real

lecture theatre, but it is more difficult to ignore a blinking

red light). This solution is bound to enhance participation

by the virtual audience.

Another problem with the virtual lecture theatre is that

the virtual audience did not have a degree of co-presence for

the real audience. This is because the avatars appeared pale

and indistinct on the 2D screen in the real auditorium,

especially with both artificial and natural light in the real

lecture theatre. Again, there is a solution that is easy to

implement: The real lecture theatre could be darkened, with

the lights dimmed or switched off, so that the screens (both

the virtual audience, and the powerpoint) could be seen

much better. If the lecture theatre would be darkened,

making it almost black inside, the virtual audience would be

made to be highly co-present, which could make the room

seem spooky, though it would need to be investigated if the

room was experienced as such. This solution of a darkened

real lecture theatre could be implemented by also having a

spotlight on the real speaker (as on a stage in a darkened

theatre performance or TV awards show). Speakers only

need to see the front row of real faces to obtain feedback on

what they are saying during the talk. During the question

and answer sessions, in contrast, the real lecture theatre

needs to be better lit so that the speaker can see raised hands

even in the rows furthest away. Note, however, that during

the question and answer session, a lighter room with the

problem of lesser co-presence of the virtual audience does

not pose a problem for questions from the virtual audience

in view of the suggestion that was just made with questioner

recognition via a red blinking light.

A key question for the future concerns the advantages

and disadvantages of virtual conferencing as against par-

ticipation via video conferences. One area of potentially

great improvement of how avatars could participate would

be to have their faces in video—perhaps by means of a

video skype link that is activated when the avatar asks a

question, and whereby the questioner is inserted in video as

a small window in the virtual lecture theatre. Such a live

video ‘talking head’ was requested by a number of

speakers and members of the real audience, and a number

responses in the survey of remote participants also indi-

cated a preference for video over virtual. Video would

obviously enhance the expressivity of the avatar/talking

head (facial expressions could be seen) and thus also

enhance the realism and turn-taking.

5.4 Virtual audience participation

The current problem with videoconferencing participation

by large number of participants is bandwidth. Note, how-

ever, that this problem does not apply to the scenario that

has just been described—since only one video ‘talking

head’ would participate at any one point in time. Such a

video talking head would enhance the co-presence of the

virtual audience. Against this, it could be argued that such a

set-up would weaken the realism of the virtual audience

and virtual lecture theatre (avatars and video combined in

an artificial way). It would be useful to investigate whether

such a mix is problematic. One hypothesis, based on the

fact that people do not have problems interacting with

avatars with the most minimally human of realistic

expressiveness (Bailenson et al. 2006), is that this would

not be problematic. Clearly, video would have the advan-

tage of making asking questions more like having con-

versation (put the other way around: it would reduce turn-

taking problems considerably).

One of the advantages of the VO environment over

Second Life is that only people within a certain aural range

can hear others. But whereas this is an advantage in the

setting of office collaboration (Sharma et al. 2011), in the

virtual conference setting described here, it is only neces-

sary to hear one questioner at a time. Indeed, it is not clear

that spatial audio adds anything to virtual conference par-

ticipation, though again, one question for further research

could be about which conference situations are enhanced

by spatial audio.

We can also note that although the survey responses

highlighted audio problems, the remote participants did not

differentiate between the fidelity of the audio and issues

such as that the microphone was not necessarily passed

202 Virtual Reality (2013) 17:193–204

123

Page 11: Mixing real and virtual conferencing: lessons learned

around to questioners in the real audience (which made it

impossible to hear the question). Since audio is such a key

problem, future research should elicit precisely which

aspect of audio causes the greatest problems (it may not be

the quality of the equipment or technical issues, but also

how it is used, that causes problems).

As mentioned, for the conference setting described here,

12 remote locations participated, and operators at each site

managed when questions could be asked. Further, at the

real conference location, there was another operator man-

aging this question and answer process. This ‘participation

management’ went smoothly, and it is essential in the

context of a highly distributed meeting. However, an even

more effective process can be envisaged, which would

require a lighter management effort as well as permitting a

much larger number of locations and participants to take

part. This would involve two new processes:

1. If avatars want to ask a question, they should raise

their hand, but when they do so, the operator at the real

location (not the speaker) puts the avatar in a queue at

the front of virtual lecture theatre. (This can be

implemented in conjunction with the red light men-

tioned earlier).

2. The operator at the real conference location could vet

avatars/remote participants in the queue beforehand for

whether their audio is working properly, and before

they ask their question.

In this way, the remote locations would not need any

operators; there could just be one operator at the real

conference location. Thus, when new remote participants

enters a door to the virtual lecture theatre, they could

simply (randomly) be assigned to an available seat (so that

the limitation on remote participants is the number of

available seats).

In this scenario, when the speaker wants to answer the

question of an avatar, they simply indicate that they want to

take the next person in the queue, which could be a very

effective and orderly solution. Again, this would require

less effort (only one operator, rather than operators at each

location), which, again, would permit any number of

remote locations and of participants (but see the earlier

point about size of a very large virtual audience). Second, it

would ensure smooth and effective management of ques-

tions and answers.

This highly managed process may seem to introduce

much artificiality into the virtual conference participation.

However, it can be anticipated that a very smooth question

and answer process would greatly enhance the naturalness

and realism of the remote participation. Put differently,

these recommendations may seem like they might detract

from the direct co-presence of remote participants and from

their natural participation in the conference (via an

unstructured virtual lecture theatre). However, we would

hypothesize, based on our observations and on survey

responses, that it is technical failures and turn-taking fail-

ures, as well as lack of smoothness of interaction between

real and avatar participants, that detract most from co-

presence and natural remote participation. This hypothesis

should apply both to enhanced task performance (partici-

pating remotely and asking questions and receiving

answers) and to the enjoyment of the interaction (the social

aspect of the two audiences and the speaker all engaging

with each other).

6 Conclusion

In the main previous study which systematically evaluated

virtual conferencing, in Second Life (Lindeman et al.

2009), the question of realism did not arise as it did in our

study, since, first, attendees of the Second Life conference

were themselves researchers in this area, and second, the

issue of scale did not arise (only a few members attended

the Second Life meeting at one time), and third and most

importantly, the Second Life conference took place in-

world only, unlike the mixed reality setting discussed here.

The main conclusion of this paper is that more socio-

technical management, and innovative—if artificial—

solutions in this management, are needed. We have iden-

tified a number of problems of mixed/virtual conferencing

and proposed a number of such solutions for future

implementation and further research. Much of the debate in

VEs has been about ‘realism’. As we have seen, however,

this was not a concern of most of the remote participants or

those at the real conference. Both the survey, informal

interviews and participant observation point to the con-

clusion that rather than focus on ‘realism’, the design of

remote conference participation via a VE should focus on

how to create a rich, lively and engaging experience both at

the virtual and the real sites, even if these ‘artificialities’

depart from ‘realism’ and introduce effects that detract

from realism. Introducing artificialities into the VE is

something that users might benefit from, and it is likely—

though this is a topic for further research—that these arti-

ficialities would not detract from the experience of remote

or real participation.

References

Anderson J, Ashraf N, Douther C, Jack M (2001) Presence and

usability in shared space virtual conferencing. Cyberpsychol

Behav 4(2):287–305

Bailenson JN, Beall AC (2006) Transformed social interaction:

exploring the digital plasticity of avatars. In: Schroeder R,

Virtual Reality (2013) 17:193–204 203

123

Page 12: Mixing real and virtual conferencing: lessons learned

Axelsson AS (eds) Avatars at work and play: collaboration and

interaction in shared virtual environments. Springer, London,

pp 1–16

Bailenson J, Yee N, Merget D, Schroeder R (2006) The effect of

behavioral realism and form realism of real-time avatar faces on

verbal disclosure, nonverbal disclosure, emotion recognition,

and copresence in dyadic interaction, Presence. J Teleoper

Virtual Environ 15(4):359–372

Beck S, Kunert A, Kulik A, Froehlich B (2013) Immersive group-to-

group telepresence. IEEE Trans Vis Comput Graph 19(4):

616–25

Bente G, Ruggenberg S, Kramer N, Eschenburg F (2008) Avatar-

mediated networking: increasing social presence and interper-

sonal trust in net-based collaboration. Hum Commun Res 34:

287–318

Churchill E, Snowdon D, Munro A (eds) (2001) Collaborative virtual

environments: digital spaces and places for interaction. Springer,

London

Damer B et al (2000) Conferences and trade shows in inhabited

virtual worlds: a case study of Avatars 98 and 99. In: Hedin J-C

(ed) Virtual worlds. Lecture notes in computer science, Springer,

Berlin, pp 1–11

Finn K (1997) Introduction: an overview of video-mediated commu-

nication literature. In: Finn K, Sellen A, Wilbur S (eds) Video-

mediated communication. Lawrence Erlbaum, Mahwah, NJ,

pp 3–21

Finn K, Sellen A, Wilbur S (eds) (1997) Video-mediated communi-

cation. Lawrence Erlbaum, Mahwah, NJ

Garau M (2006) Selective fidelity: investigating priorities for the

creation of expressive avatars. In: Schroeder R, Axelsson AS

(eds) Avatars at work and play: collaboration and interaction in

shared virtual environments. Springer, London, pp 17–38

Gutwin C, Greenberg S (2001) A descriptive framework of workspace

awareness for real-time groupware. Computer supported coop-

erative work, Kluwer Academic Press

Harrison S (ed) (2009) Media space: 20? years of mediated life.

Springer, London

Hinds P, Kiesler S (eds) (2002) Distributed work. MIT Press,

Cambridge MA

Hirsh S, Sellen A, Brokopp N (2005) Why HP people do and don’t

use videoconferencing systems. Technical report HPL-2004-

140R1, Hewlett-Packard Laboratories, Bristol, UK. http://www.

hpl.hp.com/research/mmsl/publications/bristol.html

Kirk D, Sellen A, Cao X (2010) Home video communication:

mediating closeness. In: Proceedings of CSCW 2010

Labhart N, Hasler B, Zbinden A, Schmeil A (2012) The ShanghAI

lectures: a global education project on artificial intelligence.

J Univ Comput Sci 18(18):2542–2555

Lindeman RW, Reiners D, Steed A (2009) Practicing what we preach:

IEEE VR 2009 virtual program committee meeting. IEEE

Comput Graph Appl 29(2):80–83

Olson G, Olson J (2000) Distance matters. Hum Comput Interact

15:139–79

Penumarthy S, Boerner K (2006) Analysis and visualization of social

diffusion patterns in three-dimensional virtual worlds. In:

Schroeder R, Axelsson A’s (eds) Avatars at work and play:

collaboration and interaction in shared virtual environments.

Springer, London, pp 39–61

Rintel S (2013 forthcoming) Video calling in long-distance relation-

ships: the opportunistic use of audio/video distortions as a

relational resource, forthcoming in Electronic Journal of Com-

munication/La Revue Electronic de Communication, available at

http://seanrintel.files.wordpress.com/2010/06/rintel-2013-ejc-

videocallingdistortionsasrelationalresource.pdf. Last accessed

26.3.2013

Rittenbruch M, McEwan G (2007) Awareness survey: a historical

reflection of awareness in collaboration. HxI technical report, 21

March, available from http://www.hxi.org.au/index.php?option=

com_content&task=blogsection&id=12&Itemid=55. Last acces-

sed 16.2.2009

Schroeder R (2010) Being there together: social interaction in virtual

environments. Oxford University Press, Oxford

Schroeder R (2011) Comparing video and avatar representations. In:

Anna P, Mark C (eds) Reinventing ourselves: contemporary

concepts of identity in virtual worlds. Springer, London

Schroeder R, Steed A, Axelsson AS, Heldal I, Abelin A, Widestrm J,

Nilsson A, Slater M (2001) Collaborating in networked immer-

sive spaces: as good as being there together? Comput Graph

25(5):781–88

Schroeder R, Heldal I, Tromp J (2006) The usability of collaborative

virtual environments and methods for the analysis of interaction,

presence. J Teleoper Virtual Environ 15(6):655–667

Sharma G, Shroff G, Dewan P (2011) Workplace collaboration in a

3D virtual office. International symposium on VR innovation

Shirmohammadi S, Hu S-Y, Ooi WT, Schiele G, Wacker A (2012)

Mixing virtual and physical participation: the future of confer-

ence attendance? IEEE international workshop on haptic audio

visual environments and games (HAVE), Oct 2012

Slater M, Sadagic A, Usoh M, Schroeder R (2000) Small group

behaviour in a virtual and real environment: a comparative

study, Presence. J Teleoper Virtual Environ 9(1):37–51

Sonnenwald D (2006) Collaborative virtual environments for scien-

tific collaboration: technical and organizational design frame-

works. In: Schroeder R, Axelsson AS (eds) Avatars at work and

play: collaboration and interaction in shared virtual environ-

ments. Springer, London, pp 63–96

Tay WY (2012) Conceptualizing learning in social virtual worlds: an

ethnography of three groups in second life. DPhil. thesis,

Department of Education, Oxford University

Vander Kleij R, Paashuis RM, Schraagen JMC (2005) On the passage

of time: temporal differences in video-mediated and face-to-face

interaction. Int J Hum Comput Stud 62:521–542

Vertegaal R (1998) Look who’s talking to whom: mediating joint

attention in multiparty communication and collaboration. Ph.D.

Thesis, Cognitive Ergonomics Department, University of

Twente, Netherlands

204 Virtual Reality (2013) 17:193–204

123