Demonstration of an interactive multimedia environment

Interactive Multimedia Environment Charles Rich, Richard C. Waters, Carol Strohecker, Yves Schabes, William T. Freeman, Mark C. Torrance, Andrew R. Golding, and Michal Roth

Mitsubishi Electric Research Laboratories

itsubishi Electric Research Laboratories was founded in 1991 to conduct basic research in the field of computers and their uses. We have since fo- cused our efforts on using information technology to support and en-

hance human collaboration and learning. We have implemented a prototype computer system that combines aspects of an

on-line community with those of a virtual reality environment. We believe that interactive multimedia environments can support and enhance human collaboration and learning. Virtual reality environments in particular can provide an immersive quality that lets users explore many new learning experiences. The prototype’s purpose is to explore the capabilities of current technology in these areas and to develop future research directions. The system integrates a number of key technologies for sup porting human collaboration and learning. ~ Starting on the overleaf is a comic-stripkyle presentation of a session that demonstrates the system. The system produces animation in real time accompanied by stereo -

The images in this article tell the story of one user’s experience with a virtual reality

environment that promotes collaboration

and learning.

sound. The frames in the comic strip are taken (with some cropping) directly from a user display. The dialogue balloons and sound effect captions were added by an illustrator to show what the user said and heard during the session. The digital clock below each frame shows the elapsed time in minutes and seconds from the start of the session.

Key technologies In today’s society, working and learning are mostly collaborative activities inter-

woven with the social life of communities (on-line and otherwise). Our prototype is therefore based on a network architecture that lets multiple users at different locations interact in real time.

December 1994 W18-Y162/Y4/$3000 1994 IEEE 15

In addition to human users, the prototype supports artificial agents. These agents can be collaborators in a computer-mediated task or characters in a virtual world visited by a user. Artificial agents increase the potential richness of an environment’s interactive behavior; for example, they may be available to talk to or play with when other human users aren’t on line.

Unlike users in text-based on-line communities, users of the prototype communicate with each other and artificial agents through speech. (The artificial agents use speech recognition and generation technology.) Spoken interaction is much less cumbersome than using the keyboard.

In addition to spoken interaction, we are

also interested in exploring the roles of sound and body movement in enhancing a user’s feeling of immersion in a virtual world. The prototype therefore includes audio rendering, hand-gesture recognition, and body-position-sensing technology.

Finally, unlike most current multimedia learning systems that use stored images, all images seen by our users are produced in real time through physically based animation. This gives the system more flex- ibility to respond to a user’s actions.

This article demonstrates the system from a user’s point of view. A companion article in the Winter 1994 issue of IEEE MultiMedia describes how we implemented and combined the key technologies (see sidebar).

Companion article in IEEE MultMedia

We briefly summarize below some technical specifications of the system demonstrated here. For further amplification, see our companion article, “An Animated On-Line Commu- nity with Artificial Agents,” in the Winter 1994 issue of /E€€ MultiMedia. (To order, call (800) 272-6657, fax (714) 821- 401 0, or e-mail cs.booksQcomputer.org. Single copy prices are $10 for members and $20 for nonmembers.)

Typically, each agent station in the prototype system architecture is a distributed system of Silicon Graphics and Hewlett-Packard workstations communicating via Unix sockets.

using dedicated analog wires. In the new platform under development, sound will be transmitted digitally on the network along with other information, which will facilitate scala- bility by letting us use broadcast, rather than point-to-point, connections. Furthermore, the new platform will perform all sound processing in software, as opposed to the computer- controlled analog sound-processing equipment used in the prototype.

Sound information is currently tranimitted between agents

The control algorithm that animates Mike’s body was

I

A virtual world

The demonstration session takes place in a virtual world consisting of a two- room building, shown in the first frame of the strip, and an adjoining yard with athletic equipment. The session features four human users named Merle, Andy, Marc, and Yoko, and a computer-simulated biped robot named Mike, who is a permanent resident of the world. These five agents are represented in the virtual world by animated figures, which are their “virtual bodies.” The agents speak English to communicate.

Figure 1 shows what is happening in the real world during the demonstration ses-

adapted from a real physical robot developed by Marc Raib- ert and his colleagues at the MIT Leg Laboratory. The physical simulation runs at 1 KHz on a dedicated HP-715/50.The Summit system developed by Victor Zue and his colleagues at MIT performs Mike’s speech recognition in near-real time, running on a dedicated HP-730 workstation.

Mike’s natural language understanding uses a version of the XTAG parser, originally developed by Aravind Joshi and his colleagues at the University of Pennsylvania. We built Mike’s “thinking” component with software provided by Joseph Bates and his colleagues in the Oz project at Carnegie Mellon University. Mike’s voice is generated from Digital Equipment Corporation’s DECtalk.

Using a recent algorithmic advance, we can recognize Merle’s hand gestures (which control his navigation in the virtual environment) 10 times per second using a dedicated HP-730, with no user-dependent training or special marking of the hand. The sensors sewn into Yoko’s jacket to measure her body position are part of a magnetic interaction position sensing system from Polhemus Inc.

http://cs.booksQcomputer.org

sion. Merle, Andy, Marc, and Yoko are seated in front of computer workstations connected by a network. Mike runs on a separate workstation (actually, a collection of them). As it happens, the users are in adjacent rooms; however, they could in principle be located anywhere in the world.

All frames of the comic strip are taken from Merle’s display (indicated by the ar- row in Figure 1). Merle’s voice in the comic strip is drawn as an oval balloon whose tail comes from outside the frame. Like all other human users, Merle sees the virtual world on his display as if he were looking out of the eyes of his virtual body.

The session starts with Merle, who is new to this world, meeting Andy, who knows it well. The upper right comer of Figure 1 shows what Andy sees on his display at the same moment that Merle sees frame 0:21. Since Andy is standing in the doorway of the building facing outward, he sees Merle standing outside the building facing in.

In addition to a display, each user sta-

December 1994

tion includes stereo headphones, a mi- crophone, and an interface to control the user’s virtual body. Merle moves his body around in the virtual world using hand gestures recognized by a video camera and special software.

An artificial agent

In frame 0:34 of the demonstration strip, Merle meets Mike the robot. Mike notices Merle’s arrival in the room and on his own initiative comes over to intro- duce himself.

Mike’s two legs are like pogo sticks. Even when he is stationary, he must keep bouncing back and forth to maintain his balance.

Mike has several different gaits, or footfall patterns, that he can use when stationary or moving. In frame 049, he is running (alternating left and right footfalls. In 1:25, he is “pronking” (hopping

I U

~

up and down on both feet). Mike can also Figure 1. System architecture.

I

hop on one foot and skip (alternate two left footfalls with two right ones).

Mike’s movements are rendered realistically because they are based on a de- tailed physical simulation (see the ZEEE MultiMedia article) -so realistically that he sometimes loses his balance and falls over. This is most likely to happen when

18

Mike is pronking, as in frame 1:32. Rather than viewing falling as a failure of the system, users have found that this makes Mike seem more real and likable.

Users can speak to Mike, as shown in frame 257 and the following frames. He answers with a computer-generated voice that sounds like that of a 10-year-old

child. The prototype falls short of supporting fully natural conversation, however, because Mike’s comprehension is limited to about 100 words and there is a delay of 20 to 30 seconds as he replies to speech. This delay is due to an assortment of technical problems (not including speech recognition, which is near-real-

COMPUTER

I TT -

I

I

I

time). None of these problems is funda- mental in nature. Mike also sings from time to time (for example, in frame 3:50), which has turned out to be his most en- gaging behavior.

In frame 3:36, Andy leads Merle through the wall of the building into the yard outside. This is possible because the

prototype system does nothing to prevent it. Being able to do things you can’t do in reality is one of the most valuable prop- erties of virtual environments. Mike knows where the doorways are and uses them (we just made him that way).

Frames 3:50 through 1025 take place in the yard behind the building. This yard

features several pieces of athletic equipment, such as an exercise mat, a ramp, and a running track. When an audience is present - in this case, Merle and Andy -Mike shows off his gymnastic routines. A routine is a sequence of steps, such as going to a certain location in a particular gait, turning around a certain number of

December 1994 19

‘I 7- I

Zircus Carol Strohecker, in a 12-minute video, imagines a place

called Zircus - part zoo and part circus -visited by two children. (See “The ‘Zircus’ Concept Sketch for a Learning Environment and On-Line Community,” Presence: Teleoper- ators and Virtual Environments, Vol. 4, No. 2, to be published in 1995). In Zircus, animals roam freely through naturalistic terrains, and visitors are not mere spectators but participants whose actions affect the environment.

The Zircus design emphasizes constructive activities and social interaction. For example, choreographing an acrobatic routine can help children learn about relationships between strength, speed, and other characteristics that determine a sequence of stunts. Playing with users from different cultures can help them understand foreign languages.

Permanent Zircus residents (artificial agents) include an

English-speaking horse, a Spanish-speaking robot, a French-speaking robot,

who is a sawy guide to the environment. This

a “repetition machine” that lets Zircus visitors hear interesting words and phrases a second time.

Children might design their own appearance in the environment, compose multimedia messages to friends who may not be on line at the time, or work with a collection of high- level software tools (a creature-construction set) that form a microworld for working with the fundamentals of motion, such as balance and center of mass, as they relate to how animals move

and a multilingual robot

robot has a parrot friend,

.a -9 / , “..,. >’ - -f , , ,73i:

I&:, :

! - times, or saying something. Mike under- stands a range of simple English sen- tences related to his routines. In particular, he can be told to perform, describe, and modify his routines, as shown in frames 7:02 through 8:36.

Electronic meeting building. In 12:03, Merle and Andy join them to ask a question.

Marc’s and Yoko’s virtual bodies are different from Andy’s and Merle’s and are controlled by different interface hardware (we wanted to explore various hardware

place While the preceding is going on, Marc

and Yoko are meeting in a room of the

COMPUTER

interfaces). Marc’s and Yoko’s virtual bodies have heads, torsos, and forearms that can move relative to each other, which lets them communicate via some simple gestures as well as by speaking.

Yoko is wearing a jacket with sewn-in sensors that measure the position of her head, torso, and forearms. The system maps this information into the corre- sponding posture of her virtual body.

Instead of a direct mapping from the posture of his real body to his virtual body, Marc controls his virtual body through a switch box and joystick. The switch box allows him to select one of seven predefined postures. The joystick controls the orientation of his head and therefore what he sees.

Collaboration and learning

As a vehicle for collaboration, an electronic meeting place has several advantages over teleconferencing. Rather than being lined up on TV monitors, users can arrange themselves spatially in any way they like, such as around a table to mimic physical meetings. They can also adjust this arrangement dynamically to suit the moment. Instead of being limited to what a video camera sees, users can project whatever persona they prefer. Finally, participants in an electronic meeting can easily and naturally share artifacts that don’t exist in the real world (like CAD representations of objects not yet built).

As a vehicle for leaming, the prototype is designed to support an immersive, constructive approach. For example, creating and modifying Mike’s routines is a constructive activity that helps Merle leam about Mike’s capabilities. While many multimedia applications cast users as stu- dents and computers as teachers, our no- tion of a computer-based leaming environment depends on scenarios in which users, as participants, “teach” the computer and in the process, themselves and one another.

ur prototype system is limited by what is technically feasible 0 today. However, technology,

particularly in the areas of real-time animation, audio processing, and speech recognition, is advancing very rapidly at the moment. Many things that are now expensive - or even impossible -will

December 1994

be commonplace in a year or two. In parallel with the technological ex-

ploration embodied by this prototype, we also created a “video sketch,” uncon- strained by current technology, of what an imagined interactive environment for collaboration and learning might look like to a user (see “Zircus” sidebar). We are now in the midst of implementing a new, more powerful and robust platform to support further research. We plan to use it to experiment with a range of collaboration and learning scenarios such as those in Zircus.

Acknowledgments Many components of the prototype system

were developed by researchers at other insti- tutions, including research groups led by Marc Raibert at the Massachusetts Institute of Tech- nology and Boston Dynamics Inc., Victor Zue at MIT, Joseph Bates at Camegie Mellon Uni- versity, and Aravind Joshi at the University of Pennsylvania.

Steve Blake, John Shiple, Eric Conrad, and Hann-Bin Chuang provided additional technical assistance. Steve McNeon and Ilene Sterns helped turn the system output into the comic strip shown here.

Charles Rich works at Mitsubishi Electric Re- search Laboratories, where he explores intelligent software agents based on the principles of collaborative discourse. He cofounded the Programmer’s Apprentice project at the MIT Artificial Intelligence Laboratory and codi- rected it until 1991.

Rich received his PhD from MIT in 1980. He is a fellow of the American Association for Artificial Intelligence.

Richard C. Waters works at Mitsubishi Elec- tric Research Laboratories. He cofounded the Programmer’s Apprentice project at the MIT Artificial Intelligence Laboratory and codi-

rected it until 1991. His research interests include the development of interactive learning environments.

Waters received his PhD from MIT in 1978. He is editor of the algorithms section of ACM Lisp Pointers.

Carol Strohecker works at Mitsubishi Electric Research Laboratories. She is concerned with how people learn and how computer technologies can support the process.

Strohecker received her PhD from MIT in 1991. She has been a fellow of the Graduate School of Design at Harvard University, the Mas- sachusetts Council for the Arts and Humanities, and the National Endowment for the Arts.

Yves Schabes works at Mitsubishi Electric Re- search Laboratories. His research interests are in the mathematical, computational, and lin- guistic aspects of natural languages. Recently, Schabes has led the design and implementa- tion of a wide coverage tree-adjoining gram- mar for English and has investigated new sta- tistical models for natural language processing.

Schabes received his PhD from the Univer- sity of Pennsylvania in 1990.

William T. Freeman works at Mitsubishi Elec- tric Research Laboratories. His current work involves hand-gesture recognition by computer and Bayesian models of perception. His research interests are in computational vision, image processing, and electronic imaging. Freeman’s research at the Polaroid Corpora- tion from 1981 to 1987 led to nine patents and the Palette film recorder.

Freeman received his PhD from MIT in 1992.

21

‘I --IT‘

First Announcement & Call for Papers

?

International Congress on Intellectual Property Rights for Specialized Information, Knowledge and New Technologies

2 1 - 25 August 1995, Vienna, Austria organized by Austrian Computer Society, Austrian National Commission for UNESCO,

Vienna University of Technology, TermNet

sponsored by IFlP - CEPE - UNDO - I S 0 - FID - lnfoterm - IAMLADP - JlAMCAll - IPA

A I M S

Computer hardware and software tech- nolo y allows us to record and process :omjex information and knowledge simul- taneously in various languages and in different forms of representation.

The increasingly unrestricted possibilities to mani ulate representations of information ond Lowledge make i t difficult to dis- tin uish clearly between the ori inal and its d p r i n g . Moreover, the tranjer of information via information highways will re- quire legal and organizational means to assist ihe transfer of related rights.

The le al provisions concerning the protec- tion of intellectual property rights are still based on traditional hardcopy oriented technology. New representation methods and technologief are not sufficiently covered by ihese provisions.

INTRODUCTORY TUTORIALS AND SEMINARS

Tutorials/Seminars are planned on Mon day, 21 August '95 on the following topics:

- Graphical Design - Standardization - Patents - New media and multimedia

- International communication systems and services

EXHIBITION

An exhibition of pertinent sohare, ubl cations and projects will be organizecfhon 21-25 August '95 in conjunction with th congress.

Deadline for submission of Paper (draft): 13 January 1995

- Cryptography

SECTIONS

Section 1 : Knowledge bases, databases, information and communication services

Section 2: KnowRight related aspects of software

Section 3: Terminology and lexicography Section 4: Scientific-technical publishing Section 5: Multimedia Section 6: Graphics and design

For further information please contact

W. Grafendorfer Austrian Computer Society Wdlzeile 1-3 A-1 0 10 Vienna Austria Euro e Phone: '+43/! /5 1 2 02 35 Fax: +43/1/512 02 35-9 email: [email protected]

Name (Please Print)

PLEASE NOTIFY us WEEKS " ADVANCE New Address

Mark C. Torrance is a graduate student at the MIT Artificial Intelligence Laboratory. He is working on ways for research collaborators to share knowledge structures.

Torrance received his SM degree from MIT in 1994, where he built an indoor mobile robot control system that learned human terms for places, navigated to places on command, and answered questions about the spatial relationships among these places. His master's research was sponsored by a National Science Foundation Graduate Fellowship.

Andrew R. Golding works at Mitsubishi Elec- tric Research Laboratories. His research interests are in using machine-learning techniques to develop high-accuracy computer systems, particularly in the area of natural language.

Golding received his PhD from Stanford University in 1991. His work has received the Best Paper award at the 1991 AAA1 Confer- ence and the 1993 Gary K. Poock Editorial Award from the American Voice Input/Out- put Society.

Michal Roth works at Mitsubishi Electric Re- search Laboratories. She has extensive experience in software development in a wide va- riety of domains, with a focus on scientific applications.

Roth received her BA in mathematics from the Technion in Israel in 1983.

City State/Country Zip I Readers can contact the authors at Mit-

subishi Electric Research Laboratories, 201 Broadway, Cambridge, MA 02139 or e-mail [email protected].

COMPUTER

mailto:[email protected]

Demonstration of an Interactive Multimedia Environment, pp. 15-22

Charles Rich, Richard C. Waters, Carol Strohecker, Yves Schabes, William T. Freeman, Mark C. Torrance, Andrew R. Golding, and Michal Roth

The authors have implemented a prototype computer system that combines aspects of an on-line community with those of a virtual reality environment. They believe that interactive multimedia environments, and virtual reality environments in particular, can provide an immersive quality that lets users explore many new learning experiences. The prototype’s purpose is to investigate the capabilities of current technology in this area and to develop future research directions. The system integrates a number of key technologies for supporting human collaboration and learning. To simulate today’s collaborative activities, it uses a network architecture that lets multiple users at different locations interact in real time.

In addition to human users, the prototype supports artificial agents. These agents can be collaborators in a computer- mediated task or characters in a virtual world visited by a user. Artificial agents increase the potential richness of an environment’s interactive behavior.

Unlike users in text-based on-line communities, users of the prototype communicate with each other and with artificial agents through speech. In addition to spoken interaction, the prototype includes audio rendering, hand-gesture recognition, and body-position-sensing technology.

such a virtual reality environment. The frames in the comic strip are taken (with some cropping) directly from a user display. A companion article in the Winter 1994 issue of IEEE MultiMedia describes how the authors implemented and combined the key technologies.

The images in this article show how users might experience

LAN and YO Convergence: A Survey of the Issues, pp. 24-32

Martin W. Sachs, Avraham L e a and Denise Sevigny

Although they are both interconnects that move information from one place to another, local area network and computer U 0 architectures have traditionally been regarded as fundamentally different technologies. LANs connected inde- pendent and widely separated computers, whereas U 0 channels connected a host to peripheral devices such as disks and tape drives. However, modem system requirements, together with recent technology developments, are now causing a convergence of the two architectures in certain environments.

Discussion focuses on the changes in requirements for interconnect distances, information models, and computation models on which LANs and channels have been based. Both LANs and channels have benefited from fiber-optic technology advances that extend the combination of bandwidth and

interconnect distance. At the same time, the information model - driven by multimedia applications using voice and video -is evolving to include information with very different characteristics.

In the evolving U 0 computation model, data are increasingly off-loaded from the host to intelligent file servers that form the basis of client/server architectures. The result is that a traditional LAN peer-to-peer interconnect quite naturally supports the current U 0 computation model. There are few differences between LAN and IIO architectures developed to serve the demands of multimedia and clientlserver applications, and even these differences may disappear in the next decade.

An overview of major LANs and channels is presented, noting the similarities and differences of architectures - such as ATM, FDDI, ESCON channel, Fibre Channel, and HiPPI - in the context of nine key architectural aspects.

Communication Styles for Parallel Systems, pp. 34-44

Thomas Gross, Atsushi Hasegawa, Susan Hinrichs, David R. 0 ’Hallaron, and Thomas Stricker

Programs executing on private-memory parallel systems exchange data by explicitly sending and receiving messages. Two communication styles for these systems differ in their code-generation strategies. With memory communication, messages exchanged between processors are buffered in memory. With systolic communication, messages are trans- ferred between the communication subsystem and the proces- sor, without any buffering in memory. Each communication

style has advantages and disadvantages. Memory communication is simpler to program and more flexible. Systolic communication offers increased instruction-level parallelism and reduced access to memory.

The authors are the first to empirically evaluate how communication style affects parallel-system resource use. The experiments were conducted on the iWarp system, which efficiently supports both communication styles and provides an excellent platform for parallel-program generators. The results indicate that (1) systolic communication programs

12 COMPUTER

Documents

Demonstration of an interactive multimedia environment