Upload
ralph
View
214
Download
1
Embed Size (px)
Citation preview
ORIGINAL ARTICLE
Mixing real and virtual conferencing: lessons learned
Geetika Sharma • Ralph Schroeder
Received: 20 August 2012 / Accepted: 1 August 2013 / Published online: 14 August 2013
� Springer-Verlag London 2013
Abstract This paper describes a conference which linked
several remote location sites via a virtual environment so
that the virtual audience could follow the presentations and
interact with real presenters. The aim was to assess the
feasibility of linking distributed virtual audiences to an
ongoing conference event. The conference consisted of an
annual gathering of researchers and developers of a global
information technology consultancy firm based in India.
This firm developed a virtual environment specifically for
distributed collaboration across sites. During the confer-
ence, researchers gathered various types of data, including
participant observations, interviews, capture of the virtual
environment and a survey of the audience. These data are
analysed in the paper. The main finding is that a number of
‘low tech’ improvements could be made to the operation of
the system that could greatly enhance this type of virtual
conferencing. A related finding is that the visual fidelity of
the environment and of the avatars plays a lesser role than
other factors such as audio quality. Given the paucity of
research on how virtual conferencing can substitute for
travel, plus the urgency of this topic for environmental
reasons, a number of suggestions are made for the imple-
mentation of remote virtual conference participation.
Keywords Collaborative virtual environments �Real and virtual conferences
1 Introduction
In this paper, we describe a technical conference which
involved the participation of both a real audience and a
virtual one from a dozen different locations. We present
various types of data, including from participant observa-
tion, informal interviews and a survey of the remote virtual
audience participants. To date, there is relatively little work
on conferencing using virtual environments, and existing
research is mainly in related areas such using virtual
environments in education, videoconferencing to maintain
personal relationships, and research on computer-supported
co-operative work for small group tasks. After reviewing
this work and how it bears on the current study, we then
describe the system features and the conference event that
is analysed here. We then go on to describe our findings,
starting with participant observations and interviews and
then turning to results from the survey. Combining these
data allows us to derive a number of lessons for future
research and for improving the experience of future
conferences.
2 Previous research and motivation
A conference shares some of the features of the use of
collaborative virtual environments (CVEs), like those used
in education. There are a number of studies of CVEs
(Churchill et al. 2001), though these have not specifically
focused on remote conference participation. Conferences
also share some features of distributed work meetings,
which have been far less studied (Sharma et al. 2011).
Finally, there are similarities with virtual exhibitions
(Penumarthy and Boerner 2006). Conferences using CVEs
are nevertheless different from these three uses of CVEs in
G. Sharma (&)
Tata Consultancy Services, 249 D & E Udyog Vihar,
Phase IV, Gurgaon, Haryana, India
e-mail: [email protected]; [email protected]
R. Schroeder
Oxford Internet Institute, University of Oxford, Oxford, UK
e-mail: [email protected]
123
Virtual Reality (2013) 17:193–204
DOI 10.1007/s10055-013-0225-x
that they involve speakers and audiences and bring people
together for the exchange of ideas. Yet, virtual conferences
have rarely been investigated or evaluated (Damer 2000).
What is known about virtual conferencing is anecdotal
or tacit knowledge, not captured in evaluations or publi-
cations. The main exception is a study by the IEEE VR
conference programme committee (Lindeman et al. 2009),
which carried out its 2-day meeting to review papers for its
annual conference in Second Life. This involved partici-
pants from around the world, and a high proportion of its
55 members and reviewers attended the meeting (42
responded to a survey, and these respondents attended for
5.5 h on average). The main findings of that study, as for
the study presented here, are that organizational factors are
more important than technical ones. The study authors have
six recommendations, which are that (1). Pre-meeting
experience is important, (2). Anonymity in text channels
needs to be ensured, (3). Two meeting chairs should be
present at all times, (4). A clear means of showing the
paper being discussed is important, (5). ‘Canned’ avatar
expressions would be useful, (6). Ensure good scheduling
of those who need to be present for discussing a paper. On
the whole, the conference meeting was successful: it
enabled more to attend than otherwise, and it saved time
and money, though it was regarded as not as good as a
face-to-face meeting. As we shall see, these findings and
recommendations have echoes in the study described here,
although the settings (virtual versus mixed virtual and
video, and review meeting versus conference presenta-
tions) are quite different, and so the lessons learned here
are also different.
In a recent experiment (Shirmohammadi et al. 2012)
which combined physical and virtual participation in a
small-scale conference (maximum of eight participants),
there were a number of similar findings to the current
study: for example, it was mentioned that the scale should
be kept small for easier management of the networked
virtual environment. In this study, there was also pre-
conference orientation for participants to ensure that there
were no technical difficulties and that users were com-
fortable with the set-up. The study also recommends that
this kind of training is essential, though in the current study
of a business setting this is unlikely to be practical. There
were a number of other features of this experiment which
are different from the current one, for example that the
virtual audience consisting of several remote participants
was represented by a single avatar navigated by the oper-
ator of the virtual environment (who would also raise an
arm on instructions from the virtual participants to indicate
a question: this is interesting in the light of our findings, but
also unrealistic). Further, there was no bird’s eye of the
environment. The study (Shirmohammadi et al. 2012)
recommends that there should be such a bird’s eye view
and that participants should have their own avatar, which
were features of the current study that are described below.
The virtual/real conference that we describe here is
novel in at least five ways: one is the mix of the virtual and
real audience. The second is that the conference was car-
ried out in a CVE which was restricted to researchers from
one organization, rather than in an open-to-all environment
like Second Life. Third, the conference was sustained over
the course of two and a half days. Fourth, the conference
brought together 12 locations where participants were co-
located but from where they could take part in a single
virtual lecture theatre with avatars. Fifth, the virtual world
was developed specifically for work inside the company.
Some of the features of the environment will be described
later.
The conference also included a live video feed of the
conference presentations inside the virtual lecture theatre.
Previous work on videoconferencing has been about
meetings (Hirsh et al. 2005), scientific collaboration
(Sonnenwald 2006), collaboration in small groups (Vander
Kleij et al. 2005), home uses of video systems (Kirk et al.
2010), and comparisons of video and avatar representations
and interaction (Schroeder 2011). Work on conferences in
virtual environments (VEs) has so far been about com-
pletely in-world events (Damer 2000). The conference
described here is a hybrid of the two.
Much of the previous related work has been concerned
with either distributed work (Hinds and Kiesler 2002),
shared workspaces (Harrison 2009), and task performance
in small groups in virtual conferencing (Anderson et al.
2001). Video conferencing has also been studied in terms
of the performance of small groups (three participants)
compared with doing the same task face to face (Vander
Kleij et al. 2005). Similar studies of two and three partic-
ipants have been carried out for immersive and desktop
virtual environments, finding an asymmetry between par-
ticipants in the immersive system (the leader in the task) as
against the desktop participant, without any such asym-
metry in doing the task face to face (Slater et al. 2000).
Another study of doing a Rubik’s cube-type puzzle in a
distributed immersive virtual environment by two partici-
pants found that it is just as good, in terms of time taken, as
doing the task face to face (though performance with one
immersive system and one desktop, or two desktop sys-
tems, is much poorer) (Schroeder et al. 2001). This goes
against Olson and Olson’s (2000) well-known finding that
‘distance matters’ in collaboration.
Other studies have focused on shared workspace col-
laboration. For example, Gutwin and Greenberg (2001)
provide a fine-grained understanding of workspace
awareness and develop a model for how people collaborate
in small groups around shared objects and tasks [(see also
Rittenbruch and McEwan 2007) for an overview of this
194 Virtual Reality (2013) 17:193–204
123
work and a typology of analyses, and the collection of
studies in Churchill et al. (2001), Vertegaal (1998)]. This
type of work can be used in combination with usability
studies of collaborative virtual environments (Schroeder
et al. 2006), where gaze, reference to objects, and turn-
taking can be analysed in detail. It is possible to analyse
collaboration in groups in multi-user virtual environments
both quantitatively and qualitatively, categorizing different
types of acts and sequences of acts in terms of who is
interacting with whom, where the focus of the interaction
is, and whether there are obstacles to turn-taking and fluid
interaction (Schroeder 2010: 217–227). For virtual envi-
ronments and distributed collaboration in small and large
groups, object-focused or focused on spatial tasks and
verbal tasks, this research has been reviewed by Schroeder
(2010: 95–140). There are few systematic comparisons
between video- and virtual-mediated communication and
collaboration (but see Schroeder 2010: 249–74), but mixed
systems elude systematic comparison because they are
likely to provide quite different admixtures of virtual,
video and real elements.
Labhart et al. (2012), in an evaluation of using a com-
bination of video, collaborative virtual environments and
Web-based resources, also gained a number of insights that
are similar to those that will be presented here: for exam-
ple, they note that simple solutions such as a ‘clicker
system’ whereby students might, with a simple click, ask
the lecturer predefined questions (‘more explanation
please’) could enhance the interactivity between a large
group of learners and the lecturer: this is similar to a
solution that will be put forward here about how confer-
ence attendees could gain the attention of the presenter.
As for teaching and learning in virtual worlds, Tay
(2012) evaluated a number of groups in Second Life in
depth, through the findings about in-world dynamics
among teacher and learners and in groups over the course a
number of sessions over two years related to philosophy,
science and education. This detailed and long-term study
found that learning needs to be sustained by a complex
combination of group cohesion, task orientation and
learning as a group.
Work on videoconferencing has a long history (see the
early introduction by Finn (1997), and early studies in Finn
et al. (1997), but has recently been focused on long-dis-
tance personal relationships (Rintel 2013, forthcoming),
which raises different issues, such as intimacy, that do not
apply to a professional conference setting, as here.
In our study, users had a choice of two avatars (standard,
or customized with a photograph of their own face). Avatar
appearance has been extensively discussed in the literature
on virtual environments [Bailenson and Beall 2006; see
also Bente et al. (2008), and Garau (2006)] though this did
not play a significant role in this study. The idea of full-
body video meetings has recently been explored for a four-
person meeting (Beck et al. 2013), and it will be interesting
in future studies to see if the addition of full bodies can be
applied to conferences with large numbers of attendees.
As will become evident below, while this research has
developed and refined concepts that are applicable to this
study (presence and co-presence, awareness, measures of
task performance), the mixed reality setting and conference
setting in our study—which was not so much educational,
as it was a mix between an academic and a professional
conference—also means that the study surfaced different
challenges. Damer (2000) reported some descriptive find-
ings of conferences inside virtual worlds. This is the only
other study to date, apart from those discussed here which
were in different settings, of virtual conferencing as such
that we are aware of.
3 Description of the system and event
The virtual environment (VE) that was used was Virtual-
Office (VO), a system designed to enable workplace col-
laboration (Sharma et al. 2011). The conference event, and
how it was presented to conference attendees, can be seen
in Fig. 1. The view of the conference as seen by the remote
participants can be seen in Fig. 2. Apart from the VE of the
office or in this case virtual lecture theatre, the system
includes a separate window that displays the user’s aura in
a grid which contains the people within his/her presence,
awareness and text chat range. A text chat window below
the grid shows the history of the user’s past conversations
and there is a chat bar for inputting text chat (see Fig. 3,
where the top left window contains the aura grid).
Initiation of a presentation inside the VE causes the 3D
view to split further to show a video stream of the presenter
and the presentation document (right window in Fig. 3).
Live video streaming is implemented using the open-
source VLC1 library. The presenter’s video stream is
encoded using h.264 format with a variable bit rate and
320 9 240 pixels frame size. It is transmitted using UDP
protocol to a VLC streaming server hosted on a large
instance on the Amazon EC2 cloud. VirtualOffice clients
connect to this server to receive the video feed over http.
The presentation slides update automatically on an ‘in-
world’ screen as the speaker advances through the pre-
sentation. A user may browse through the presentation
independent of the speaker in the document window or
return to the speaker’s page. The VO system also includes
voice communication implemented using Mumble,2 an
1 VideoLan http://www.videolan.org.2 Mumble, an open-source voice chat software. URL: http://www.
mumble.com.
Virtual Reality (2013) 17:193–204 195
123
Fig. 1 View of real auditorium
with virtual auditorium
projected
Fig. 2 View seen by remote audience
196 Virtual Reality (2013) 17:193–204
123
open-source VoIP chat system. Mumble uses the Speex
codec for audio data encoding and UDP protocol for
transmission set at 40kbits/sec. VO clients connect to a
Mumble server hosted on another large instance on the
Amazon EC2 cloud.
All in-world updates such as avatars positions, orienta-
tions, animation states, text chat are exchanged between
clients through a virtual world server built using the
Google App Engine and hosted on the appspot domain.
More details can be found in Sharma et al. (2011).
The system was tested extensively beforehand. It has
been shown that videoconferencing needs a lot of testing,
especially when there are a number of remote locations
involved (Sonnenwald 2006). It can be anticipated that the
main failure during our event was the failure of voice when
avatars asked questions, and the poor quality of voice
communication. In this respect, our event was like many
others. It can also be mentioned already that the effect of
poor voice communication and audio quality means that
there is poor turn-taking between speakers and avatar
questions. These basic failures have implications for
design. Finally, it can be noted that this virtual confer-
encing event was very labour intensive, not just in pre-
testing, but also in having a number of operators present
throughout the event. We will make some suggestions in
the concluding sections about how these problems can be
avoided.
The study was carried out in the course of a two-and-a-
half-day conference. The format was that of a typical
research conference: keynotes, presentations, two tracks for
parallel presentations during part of the conference (only
one of these tracks implemented the virtual audience par-
ticipation). The conference was attended by approximately
80 researchers at the real conference location and approx-
imately 55 researchers in 12 remote locations, though we
obtained data only from 37 participants from 8 locations in
India. On average, there were 5–6 attendees at each loca-
tion for a presentation. All these remote participants,
though at different physical sites, entered into the same
virtual conference room (that is, the different physical sites
were not represented as 12 different rooms, but as one
room containing all avatar conference participants). The
Fig. 3 VirtualOffice window layout
Virtual Reality (2013) 17:193–204 197
123
real conference had seats that were arranged in a theatre
style (with steeply rising rows of seats, see Fig. 4), while in
the rooms for virtual participants all were seated on the
same level, Fig. 2). To manage the VE, there were two
operators at the physical location (one primarily to handle
the camera view, the other to handle audio and questions
from virtual audience). The virtual audience was displayed
on a 5 9 7 feet screen in the real auditorium. An avatar
located at the centre of the screen, standing and asking a
question was displayed at an approximate size of 8 inches.
The sizes of other avatars and objects in the VE were
progressively reduced by perspective foreshortening.
At each remote location, a conference room was set up
to broadcast the proceedings of the physical conference
with one operator. The VO window was projected in the
conference room showing a live video feed from the
physical conference, the presentation document of the
speaker and a graphical view of the virtual auditorium on a
single projection screen (see Fig. 3) of size 4 9 6 feet. The
virtual auditorium was designed to look like a typical
conference venue. Multiple cameras were pre-configured in
VO to show views of the VE from different vantage points.
The speaker’s audio was streamed into VO and played on
the sound system installed in the conference room, and
there was a microphone to talk to the physical auditorium
and other remote conference rooms. The VO window is
designed to display the speaker’s video in a resizable
window of default size a fifth of the width by a fifth of the
height of the total display size.
To attend the virtual conference, an employee of the
organisation could register at the conference website and
submit a photograph of him/her. A look-alike avatar was
created for each registered attendee, which was added to
the virtual lecture theatre when he/she was physically at a
conference room. Look-alike avatars were also created for
operators and speakers who submitted their photographs.
Employees could also walk-in to attend the virtual con-
ference without registration and generic male/female
avatars were added for them. Avatars were animated with
appropriate actions such as sitting, standing, raising hand
and speech animation depending on their real-world
activity. When avatars wanted to join from any of the
remote locations, their avatar would enter the virtual
auditorium and go to a free chair. When an attendee left the
conference his/her avatar was also deleted from the virtual
auditorium.
With this set-up, remote participants (a) did not have to
arrange the requisite hardware to set up and install the VO
software themselves, (b) were physically collocated with
others at their location and (c) could come and go
depending on their interest and availability. On the other
hand, they had less control over their own avatars. Also,
although the virtual auditorium was designed to accom-
modate 70 avatars in order to ensure real-time interactivity
and rendering, the number of remote attendees was only
limited by the size and number of remote conference rooms
that could be set up.
3.1 Method
Our methods included participant observation, a survey of
remote participants and informal interviews with the real
and virtual audiences. As regards participant observation,
one of the authors of this paper was a speaker and in the
audience at the real conference and one managed the
conference remotely.
3.2 Results
We begin with observations based on informal interviews
with the real audience and on participant observation at the
real conference (see Fig. 4). We will cover a number of
areas: how the virtual lecture theatre was displayed, how
the speaker was represented inside the virtual world, how
remote participant avatars could be identified, how avatars
could navigate and their appearance and gestures and
finally how the virtual audience asked questions and
interacted with the real conference.
3.2.1 Camera view of the virtual lecture theatre
During the conference, the virtual lecture theatre was often
statically displayed for the real and virtual audience, apart
from when avatars entered and left the room and raised
their hands for questions. When it was not static, the
operator co-located with the real conference continued to
switch the camera view: looking from the back of the
lecture theatre towards the front with the presentation
display, looking from the front of the theatre towards the
audience and generally moving around between the ava-
tars. At the remote locations, the operator would keep the
Fig. 4 Participants at the conference
198 Virtual Reality (2013) 17:193–204
123
view static when a lecture was going on. The system would
automatically focus on the avatar asking a question
whenever a remote attendee would do so. During breaks,
the operator would toggle through different camera views
or focus on one based on requests from the audience.
Although it is not an issue for the speaker in the real
lecture theatre, as he or she is generally facing the real
audience in the lecture theatre except during the Q&A,
from the point of view the real audience, the balance
between a static virtual lecture theatre and switching the
view was important: a perfectly static camera view of the
virtual theatre would have seemed rather boring and would
have made the real audience feel as though the virtual one
was not there (avatars entering and leaving the virtual
audience were rather infrequent). Moving the camera
around too much, on the other hand, can be distracting for
the real audience. This balance could be investigated in
future work, where both the real audience and remote
participants could be given a choice in a survey between
whether they prefer more versus less camera movement.
The point here is that having only one or other of these
options exclusively would not have provided a rich expe-
rience for the real and the virtual audience.
3.2.2 Virtual speaker
There were look-alike or generic avatars of the speaker in
the virtual lecture theatre as for the virtual audience.
Although this may add to the conference experience of the
virtual audience, it was neither a distraction nor did it add
to the experience of the real audience to see both the real
speaker and his or her avatar representation. One reason for
highlighting this is that in a mixed real/virtual setting,
having both real and avatar speakers simultaneously is
somewhat artificial since people cannot be in two places at
the same time. On the other hand, there were a few occa-
sions when the video stream became unavailable due to
low bandwidth at either the streaming out or the receiving
location. In such cases, having the speaker’s avatar in view
helped preserve a sense of continuity of the conference.
3.2.3 Avatar identification
The names of operators and their locations (i.e. Ahmeda-
bad, Delhi) were visible in the aura grid and would appear
above their avatars’ heads in the VO environment when the
mouse hovered on them. This was a useful feature as it
gave the real audience a sense of the remote participation.
On the other hand, it was not clear to the real audience
whether look-alike avatars or generic avatars, whose names
did not appear in the grid, were participating via the remote
location or simply added as ‘filler’. Making this clear to the
real audience (for example, by only allowing ‘named’
avatars to participate) might add to the realism of the vir-
tual audience.
3.2.4 Navigation, avatar features and gestures
Avatars could only walk to their seats, sit, raise their hands,
stand and show speech gestures when speaking. Unlike in
systems like Second Life, there was no flying around or
running, which was suitable for the workmanlike and
serious atmosphere of a research conference. Apart from
this, automatic gestures and movements (shifting one’s
body) were generated for the avatars, which added con-
siderably to their ‘presence’.
Some employees had their look-alike avatars custom-
ized by means of a photograph of their head rendered in
3D. Out of the 70 registrants for the virtual conference, 33
submitted photographs, of which 26 met the requirements
for creating a look-alike avatar. This was highly effective
both for enhancing the ‘presence’ of the virtual audience
and allowing the real audience and the speakers to recog-
nize colleagues. One drawback, on the other hand, was the
absence of facial expressions.
There were also, apart from the avatars with photo-
realistic heads, some ‘generic’ avatars which looked
identical (though to increase the variety somewhat, there
were outfits with two different colour schemes for both
male and female avatars). This limited variety—again,
contrasting with Second Life and the ability for each per-
son to customize their avatar—seemed appropriate for the
context of a research conference and perhaps preferable to
some of the outlandish avatars in Second Life.
3.2.5 Recognition of questions from the virtual audience
and turn-taking
Speakers prioritized recognizing questions from the real
audience by a ratio of perhaps ten real to one avatar
question. Since avatars had their hands raised for many
more questions, perhaps twice the number that were rec-
ognized, this means that many questions from the virtual
audience were not answered. Now it is true that there were
far fewer questions from the virtual audience. Yet, it was
also very difficult for the speaker to simultaneously scan
the virtual audience in addition to the scanning the real
audience to identify the raised hands of avatars.
It can be added that partly because of poor audio quality,
and also partly because a question and answer session
requires non-verbal communication (nodding, body posture
to indicate disagreement, etc.), this disadvantage—plus the
disadvantage of not having questions recognized—is a key
challenge to the conference scenario presented here. In this
respect, it can be noted that although a completely in-world
Virtual Reality (2013) 17:193–204 199
123
conference shares this disadvantage, at least all participants
are disadvantaged equally.
Turn-taking also relates to following up on questions with
a second comment or question, and this proved almost
impossible for the virtual audience—whereas it is easy for
the real audience. Finally, turn-taking with the virtual ava-
tars asking questions was also less than smooth because of
the out-of-sync stops and starts and overlapping speech for
both the real speaker and the virtual questioner. This prob-
lem is well known in several forms of mediated communi-
cation (including synchronous chat in VEs, though in chat
this does not necessarily reduce effectiveness—as all the text
is available on the screen for review). In addition, the virtual
audience depended for hearing questions from the real
audience on a microphone being passed to the questioner in
the real audience, which did not happen consistently in this
case and was thus frustrating and disadvantaging.
4 Survey results
At this point, the results of a survey can be discussed, which
was completed by remote conference participants at eight
remote locations—Delhi, Kolkata, Bangalore, Ahmedabad,
Mumbai, Pune, Hyderabad and Chennai—with 37 respon-
ses to 15 multiple choice and open-ended questions. Most of
the remote participants (29) filled out the survey in the first
session of the virtual conference they attended; 8 filled out
the survey on returning to the virtual conference for another
session. When asked if they had used video conferencing
tools such as WebEx3 before, 30 said they had and 4 had not
(3 did not answer). We also asked whether they preferred
WebEx or the virtual world and why, and there were 12
roughly evenly divided responses (some pointed to advan-
tages and disadvantages of both): those that preferred the
virtual world mentioned the greater multimodality, inde-
pendent control over presentation documents, interactivity
and intuitiveness. The drawbacks of the virtual world were
poor audio and video. Among those who preferred WebEx-
type videoconferencing, the reasons included greater reli-
ability, and better video and audio quality.
4.1 Problems with technology and suggestions
for improvement
When asked whether they would have preferred a different
layout of the projected screen (see Fig. 3), 16 answered no.
Of the 20 who said yes, 7 would have preferred more video
(i.e. a larger proportion of the screen allocated to video), 6
more of the virtual world, and 7 more of the presentation
document (no answer 1). We also asked an open-ended
question about how the virtual conference could be
improved, and the vast majority of the 30 responses where
about how audio and video quality could be improved.
Several responses pointed to other faults to do with the
presentations (for example, speakers should have laser
pointers). There were also several suggestions about
remote participants having more control over their point of
view in engaging with the real conference audience (i.e.
controlling the camera in the conference room). Only one
respondent made suggestions about better control of avatar
movements, gestures and expressions (also of the avatar
speaker).
4.2 Interacting with the real conference and the virtual
audience
To elicit whether remote participants had a sense of being
there at the conference, we gave respondents four choices
on a scale: very strong (none), strong (5), weak (19) and
very weak (6) (no answer 7). Then, we asked whether, also
considering time and cost, they would have preferred to
attend the conference in person, 10 said they did not and 21
would have preferred to be there in person (no answer 6).
Only 2 respondents said that they wanted to ask a question
of the conference presenter and did not have a chance to do
so, whereas for 32 this had not been a problem (3 did not
answer).
Asked if they enjoyed more being in the room together
with remote or with physically present people, 13 answered
‘with remote people’ and 14 ‘with physically present
people’ (no answer 10). When asked if they recognized any
of the other remote avatar participants, 14 respondents said
that they did and 19 did not (4 did not answer). For those
who did recognize one of the avatars, we asked whether
this recognition made them want to establish another kind
of contact with the avatar, 10 answered positively (1 said
they would do this via phone, 8 said via email and one said
‘other’).
We also asked whether they would have preferred to
control their own avatar (i.e. not to have it controlled by an
operator), and 21 said yes and 10 said no. In a separate
question, we asked if they thought that their avatar was a
good way to represent them at the conference, and 17
replied positively. Of the 9 who disagreed (11 did not
answer), 2 said there should be greater fidelity and 2 said
that they did not get a chance to experience their avatar.
4.3 Experience of the conference
Asked if they found the experience of remote conference
participation useful for learning about the conference
content, 5 answered ‘very useful’, 14 ‘quite useful’, 9 ‘a
little bit useful’ and 7 ‘not useful’ (4 did not answer). There3 WebEx, Web conferencing. URL: http://www.webex.com.
200 Virtual Reality (2013) 17:193–204
123
were 22 replies to an open-ended question: ‘what did you
like most about attending the conference virtually?’ These
replies mentioned a wide range of reasons: from the
practicalities of avoiding travel and cost and being able to
devote one’s attention as one wished, to being there while
not being there physically, being able to roam around the
virtual conference room, and having a number of modali-
ties on the screen.
We also asked ‘what did you like least?’, and of the 21
replies, over half were about poor audio quality, six com-
plained about the video quality (four of these mentioned
both audio and video) and only one mentioned that the
avatar experience could be improved (the remainder were
other quality complaints). In a separate forced choice
question, 18 thought that the audio quality was poor and 17
thought it was good.
Before we discuss these results further, it can be men-
tioned that a striking feature of the survey responses is how
mixed and nuanced the answers were. This could indicate a
validation of the usefulness of the questions, but more
importantly suggests that remote participants recognized
the benefits as well as the drawbacks of virtual conference
participation.
5 Discussion, implications for design and future work
Many points made in the previous sections address mun-
dane issues about mixed reality conferencing. However, in
our experience, much previous research on distributed
collaboration in MUVEs (in education, meetings, exhibi-
tions as well as conferences) has consisted of documenting
trials and demonstrator systems without evaluating them or
suggesting how they could be improved (Sonnenwald
2006).
Having learned a number of lessons from this confer-
ence, we now describe a number of ways in which the
problems that occurred during the event could be over-
come, and how future events of this type could be socio-
technically managed to enhance the experience—both in
terms of the effectiveness of the task and in terms of the
enjoyment of the social interaction.
5.1 Ways to enhance future virtual conferences
One feature which could easily be implemented in future
conference events is allowing people to move between
different parallel track presentations. As long as all of them
are captured on video and the presentations are made
available in different virtual lecture theatres, avatars could
freely move between them. It is a well-known problem in
physical conferences that moving around between sessions
entails disruptions, but avatars moving between sessions
would create only minimal disruption. Further, with
responses to speakers at conferences nowadays increas-
ingly being tweeted, moving around in the light of how
highly the audience rates a speaker in real time could be
highly effective—though it might also be discourteous.
A useful feature at mixed real/virtual conferences could
be to allow real audience members (apart from the speaker)
to go up to the screen and speak to avatars whom they
recognize during breaks. This informal getting together to
discuss the presentation, meet new people or chat with
those whom one has not seen for some time is a well-
known benefit of physical conferences, and easily imple-
mented with a virtual audience (the number of people who
can ‘mill around’ and find and talk to each other may be
limited with virtual audiences, but locating and queuing to
speak to each other also has possibilities in a virtual
environment that are more restricted in a physical space).
5.2 Benefits of real/virtual conferencing
One advantage of virtual conferencing is of course that
virtual participants could, in principle, be anywhere (such
as lying on the beach) as long as they had a laptop. On the
other hand, it is clear that our co-located remote groups had
useful discussions and enjoyed social interaction in their
respective physical locations, if we recall the almost even
split between those remote participants who enjoyed being
with other remote people (13) as against enjoying being
with physically co-present participants. The trade-off
between benefits and drawbacks of these two options
deserve further investigation.
A clear benefit is that the virtual audience can take
breaks, eat food and the like—all the while still keeping
(most) of their attention (depending on the other activity)
on the presentation. This is not a trivial advantage during
conferences that are often long and have few breaks.
One advantage of a virtual audience is that it is scalable.
Again, there is a trade-off: in our case, the virtual lecture
theatre was limited to 70 or so chairs, enough to accom-
modate those who had registered to participate remotely. In
principle, however, thousands or more could participate:
would a virtual lecture theatre with an audience of 1,000s
detract from their presence vis-a-vis the real audience?
Note, however, that a practical limitation to full partici-
pation (asking a question) is in any case limited to a
manageable number, at least during a ‘live’ conference.
A related point here is that one trade-off between
‘realism’ and the ‘artificial’ nature of this conference link-
up was that remote participants were in the same virtual
lecture theatre—even though they were physically in dif-
ferent remote locations. Realistically, they should have
been shown in different rooms. However, with names of
their locations above their heads, this is a non-realistic
Virtual Reality (2013) 17:193–204 201
123
feature that did not seem to detract from a ‘natural’ remote
participation experience (it was not mentioned in any of
our informal interviews or survey responses).
5.3 Communication and interaction problems
and potential solutions
Voice communication problems and audio quality have
already been mentioned. Apart from fixing this technically,
there are a number of socio-technical management solu-
tions. A critical problem for participation by the virtual
audience was to have their questions recognized by the
speaker. This was because the cognitive load on the
speaker to pay attention to and scan both the real and the
virtual audience for questions simultaneously was too
large. Further, it is easier to scan the real audience than the
virtual one since it is difficult to distinguish avatars with
their hands raised in a virtual lecture theatre displayed on a
2D screen. However, this problem can easily be overcome:
for example, there could be a small flashing red light, either
physically in the real lecture theatre (perhaps on the
speaker podium) or in the virtual lecture theatre (perhaps in
the lower corner closest to the speaker). This would make it
very easy to recognize the questions put by avatars, perhaps
even giving avatars an advantage over questions from the
real audience (it is easy to ignore a raised hand in a real
lecture theatre, but it is more difficult to ignore a blinking
red light). This solution is bound to enhance participation
by the virtual audience.
Another problem with the virtual lecture theatre is that
the virtual audience did not have a degree of co-presence for
the real audience. This is because the avatars appeared pale
and indistinct on the 2D screen in the real auditorium,
especially with both artificial and natural light in the real
lecture theatre. Again, there is a solution that is easy to
implement: The real lecture theatre could be darkened, with
the lights dimmed or switched off, so that the screens (both
the virtual audience, and the powerpoint) could be seen
much better. If the lecture theatre would be darkened,
making it almost black inside, the virtual audience would be
made to be highly co-present, which could make the room
seem spooky, though it would need to be investigated if the
room was experienced as such. This solution of a darkened
real lecture theatre could be implemented by also having a
spotlight on the real speaker (as on a stage in a darkened
theatre performance or TV awards show). Speakers only
need to see the front row of real faces to obtain feedback on
what they are saying during the talk. During the question
and answer sessions, in contrast, the real lecture theatre
needs to be better lit so that the speaker can see raised hands
even in the rows furthest away. Note, however, that during
the question and answer session, a lighter room with the
problem of lesser co-presence of the virtual audience does
not pose a problem for questions from the virtual audience
in view of the suggestion that was just made with questioner
recognition via a red blinking light.
A key question for the future concerns the advantages
and disadvantages of virtual conferencing as against par-
ticipation via video conferences. One area of potentially
great improvement of how avatars could participate would
be to have their faces in video—perhaps by means of a
video skype link that is activated when the avatar asks a
question, and whereby the questioner is inserted in video as
a small window in the virtual lecture theatre. Such a live
video ‘talking head’ was requested by a number of
speakers and members of the real audience, and a number
responses in the survey of remote participants also indi-
cated a preference for video over virtual. Video would
obviously enhance the expressivity of the avatar/talking
head (facial expressions could be seen) and thus also
enhance the realism and turn-taking.
5.4 Virtual audience participation
The current problem with videoconferencing participation
by large number of participants is bandwidth. Note, how-
ever, that this problem does not apply to the scenario that
has just been described—since only one video ‘talking
head’ would participate at any one point in time. Such a
video talking head would enhance the co-presence of the
virtual audience. Against this, it could be argued that such a
set-up would weaken the realism of the virtual audience
and virtual lecture theatre (avatars and video combined in
an artificial way). It would be useful to investigate whether
such a mix is problematic. One hypothesis, based on the
fact that people do not have problems interacting with
avatars with the most minimally human of realistic
expressiveness (Bailenson et al. 2006), is that this would
not be problematic. Clearly, video would have the advan-
tage of making asking questions more like having con-
versation (put the other way around: it would reduce turn-
taking problems considerably).
One of the advantages of the VO environment over
Second Life is that only people within a certain aural range
can hear others. But whereas this is an advantage in the
setting of office collaboration (Sharma et al. 2011), in the
virtual conference setting described here, it is only neces-
sary to hear one questioner at a time. Indeed, it is not clear
that spatial audio adds anything to virtual conference par-
ticipation, though again, one question for further research
could be about which conference situations are enhanced
by spatial audio.
We can also note that although the survey responses
highlighted audio problems, the remote participants did not
differentiate between the fidelity of the audio and issues
such as that the microphone was not necessarily passed
202 Virtual Reality (2013) 17:193–204
123
around to questioners in the real audience (which made it
impossible to hear the question). Since audio is such a key
problem, future research should elicit precisely which
aspect of audio causes the greatest problems (it may not be
the quality of the equipment or technical issues, but also
how it is used, that causes problems).
As mentioned, for the conference setting described here,
12 remote locations participated, and operators at each site
managed when questions could be asked. Further, at the
real conference location, there was another operator man-
aging this question and answer process. This ‘participation
management’ went smoothly, and it is essential in the
context of a highly distributed meeting. However, an even
more effective process can be envisaged, which would
require a lighter management effort as well as permitting a
much larger number of locations and participants to take
part. This would involve two new processes:
1. If avatars want to ask a question, they should raise
their hand, but when they do so, the operator at the real
location (not the speaker) puts the avatar in a queue at
the front of virtual lecture theatre. (This can be
implemented in conjunction with the red light men-
tioned earlier).
2. The operator at the real conference location could vet
avatars/remote participants in the queue beforehand for
whether their audio is working properly, and before
they ask their question.
In this way, the remote locations would not need any
operators; there could just be one operator at the real
conference location. Thus, when new remote participants
enters a door to the virtual lecture theatre, they could
simply (randomly) be assigned to an available seat (so that
the limitation on remote participants is the number of
available seats).
In this scenario, when the speaker wants to answer the
question of an avatar, they simply indicate that they want to
take the next person in the queue, which could be a very
effective and orderly solution. Again, this would require
less effort (only one operator, rather than operators at each
location), which, again, would permit any number of
remote locations and of participants (but see the earlier
point about size of a very large virtual audience). Second, it
would ensure smooth and effective management of ques-
tions and answers.
This highly managed process may seem to introduce
much artificiality into the virtual conference participation.
However, it can be anticipated that a very smooth question
and answer process would greatly enhance the naturalness
and realism of the remote participation. Put differently,
these recommendations may seem like they might detract
from the direct co-presence of remote participants and from
their natural participation in the conference (via an
unstructured virtual lecture theatre). However, we would
hypothesize, based on our observations and on survey
responses, that it is technical failures and turn-taking fail-
ures, as well as lack of smoothness of interaction between
real and avatar participants, that detract most from co-
presence and natural remote participation. This hypothesis
should apply both to enhanced task performance (partici-
pating remotely and asking questions and receiving
answers) and to the enjoyment of the interaction (the social
aspect of the two audiences and the speaker all engaging
with each other).
6 Conclusion
In the main previous study which systematically evaluated
virtual conferencing, in Second Life (Lindeman et al.
2009), the question of realism did not arise as it did in our
study, since, first, attendees of the Second Life conference
were themselves researchers in this area, and second, the
issue of scale did not arise (only a few members attended
the Second Life meeting at one time), and third and most
importantly, the Second Life conference took place in-
world only, unlike the mixed reality setting discussed here.
The main conclusion of this paper is that more socio-
technical management, and innovative—if artificial—
solutions in this management, are needed. We have iden-
tified a number of problems of mixed/virtual conferencing
and proposed a number of such solutions for future
implementation and further research. Much of the debate in
VEs has been about ‘realism’. As we have seen, however,
this was not a concern of most of the remote participants or
those at the real conference. Both the survey, informal
interviews and participant observation point to the con-
clusion that rather than focus on ‘realism’, the design of
remote conference participation via a VE should focus on
how to create a rich, lively and engaging experience both at
the virtual and the real sites, even if these ‘artificialities’
depart from ‘realism’ and introduce effects that detract
from realism. Introducing artificialities into the VE is
something that users might benefit from, and it is likely—
though this is a topic for further research—that these arti-
ficialities would not detract from the experience of remote
or real participation.
References
Anderson J, Ashraf N, Douther C, Jack M (2001) Presence and
usability in shared space virtual conferencing. Cyberpsychol
Behav 4(2):287–305
Bailenson JN, Beall AC (2006) Transformed social interaction:
exploring the digital plasticity of avatars. In: Schroeder R,
Virtual Reality (2013) 17:193–204 203
123
Axelsson AS (eds) Avatars at work and play: collaboration and
interaction in shared virtual environments. Springer, London,
pp 1–16
Bailenson J, Yee N, Merget D, Schroeder R (2006) The effect of
behavioral realism and form realism of real-time avatar faces on
verbal disclosure, nonverbal disclosure, emotion recognition,
and copresence in dyadic interaction, Presence. J Teleoper
Virtual Environ 15(4):359–372
Beck S, Kunert A, Kulik A, Froehlich B (2013) Immersive group-to-
group telepresence. IEEE Trans Vis Comput Graph 19(4):
616–25
Bente G, Ruggenberg S, Kramer N, Eschenburg F (2008) Avatar-
mediated networking: increasing social presence and interper-
sonal trust in net-based collaboration. Hum Commun Res 34:
287–318
Churchill E, Snowdon D, Munro A (eds) (2001) Collaborative virtual
environments: digital spaces and places for interaction. Springer,
London
Damer B et al (2000) Conferences and trade shows in inhabited
virtual worlds: a case study of Avatars 98 and 99. In: Hedin J-C
(ed) Virtual worlds. Lecture notes in computer science, Springer,
Berlin, pp 1–11
Finn K (1997) Introduction: an overview of video-mediated commu-
nication literature. In: Finn K, Sellen A, Wilbur S (eds) Video-
mediated communication. Lawrence Erlbaum, Mahwah, NJ,
pp 3–21
Finn K, Sellen A, Wilbur S (eds) (1997) Video-mediated communi-
cation. Lawrence Erlbaum, Mahwah, NJ
Garau M (2006) Selective fidelity: investigating priorities for the
creation of expressive avatars. In: Schroeder R, Axelsson AS
(eds) Avatars at work and play: collaboration and interaction in
shared virtual environments. Springer, London, pp 17–38
Gutwin C, Greenberg S (2001) A descriptive framework of workspace
awareness for real-time groupware. Computer supported coop-
erative work, Kluwer Academic Press
Harrison S (ed) (2009) Media space: 20? years of mediated life.
Springer, London
Hinds P, Kiesler S (eds) (2002) Distributed work. MIT Press,
Cambridge MA
Hirsh S, Sellen A, Brokopp N (2005) Why HP people do and don’t
use videoconferencing systems. Technical report HPL-2004-
140R1, Hewlett-Packard Laboratories, Bristol, UK. http://www.
hpl.hp.com/research/mmsl/publications/bristol.html
Kirk D, Sellen A, Cao X (2010) Home video communication:
mediating closeness. In: Proceedings of CSCW 2010
Labhart N, Hasler B, Zbinden A, Schmeil A (2012) The ShanghAI
lectures: a global education project on artificial intelligence.
J Univ Comput Sci 18(18):2542–2555
Lindeman RW, Reiners D, Steed A (2009) Practicing what we preach:
IEEE VR 2009 virtual program committee meeting. IEEE
Comput Graph Appl 29(2):80–83
Olson G, Olson J (2000) Distance matters. Hum Comput Interact
15:139–79
Penumarthy S, Boerner K (2006) Analysis and visualization of social
diffusion patterns in three-dimensional virtual worlds. In:
Schroeder R, Axelsson A’s (eds) Avatars at work and play:
collaboration and interaction in shared virtual environments.
Springer, London, pp 39–61
Rintel S (2013 forthcoming) Video calling in long-distance relation-
ships: the opportunistic use of audio/video distortions as a
relational resource, forthcoming in Electronic Journal of Com-
munication/La Revue Electronic de Communication, available at
http://seanrintel.files.wordpress.com/2010/06/rintel-2013-ejc-
videocallingdistortionsasrelationalresource.pdf. Last accessed
26.3.2013
Rittenbruch M, McEwan G (2007) Awareness survey: a historical
reflection of awareness in collaboration. HxI technical report, 21
March, available from http://www.hxi.org.au/index.php?option=
com_content&task=blogsection&id=12&Itemid=55. Last acces-
sed 16.2.2009
Schroeder R (2010) Being there together: social interaction in virtual
environments. Oxford University Press, Oxford
Schroeder R (2011) Comparing video and avatar representations. In:
Anna P, Mark C (eds) Reinventing ourselves: contemporary
concepts of identity in virtual worlds. Springer, London
Schroeder R, Steed A, Axelsson AS, Heldal I, Abelin A, Widestrm J,
Nilsson A, Slater M (2001) Collaborating in networked immer-
sive spaces: as good as being there together? Comput Graph
25(5):781–88
Schroeder R, Heldal I, Tromp J (2006) The usability of collaborative
virtual environments and methods for the analysis of interaction,
presence. J Teleoper Virtual Environ 15(6):655–667
Sharma G, Shroff G, Dewan P (2011) Workplace collaboration in a
3D virtual office. International symposium on VR innovation
Shirmohammadi S, Hu S-Y, Ooi WT, Schiele G, Wacker A (2012)
Mixing virtual and physical participation: the future of confer-
ence attendance? IEEE international workshop on haptic audio
visual environments and games (HAVE), Oct 2012
Slater M, Sadagic A, Usoh M, Schroeder R (2000) Small group
behaviour in a virtual and real environment: a comparative
study, Presence. J Teleoper Virtual Environ 9(1):37–51
Sonnenwald D (2006) Collaborative virtual environments for scien-
tific collaboration: technical and organizational design frame-
works. In: Schroeder R, Axelsson AS (eds) Avatars at work and
play: collaboration and interaction in shared virtual environ-
ments. Springer, London, pp 63–96
Tay WY (2012) Conceptualizing learning in social virtual worlds: an
ethnography of three groups in second life. DPhil. thesis,
Department of Education, Oxford University
Vander Kleij R, Paashuis RM, Schraagen JMC (2005) On the passage
of time: temporal differences in video-mediated and face-to-face
interaction. Int J Hum Comput Stud 62:521–542
Vertegaal R (1998) Look who’s talking to whom: mediating joint
attention in multiparty communication and collaboration. Ph.D.
Thesis, Cognitive Ergonomics Department, University of
Twente, Netherlands
204 Virtual Reality (2013) 17:193–204
123