[IEEE IREE International Worksho on Haptic Audio Visual Environments and their Applications, 2005. - Ottawa, Canada (Oct. 1, 2005)] IREE International Worksho on Haptic Audio Visual

HAVE 2005 – IEEE International Workshop on Haptic Audio Visual Environments and their Applications Ottawa, Ontario, Canada, 1-2 October 2005

0-7803-9377-5/05/$20.00 ©2005 IEEE

Interactive Scripts for Haptic Virtual Environments

Thomas E. Whalen1, Dorina C. Petriu2, Lucy Yang3, Emil M. Petriu4 1Communication Research Centre, Ottawa, ON, Canada

2Carleton University, Ottawa, ON, Canada 3Lakehead University, ON, Canada 4University of Ottawa, ON, Canada

Abstract – When virtual environments reach the level of complexity required to accomplish significant tasks, such as educating students or playing games, they will be based on well-written scripts. Scripts, even for continuous, real-time environments, can be written as a sequence of turns between the environment and the user. An interactive script is structured as a main plot line and alternative branches with cross references, terminations, and loops. Efficient script design requires empirical testing during development. As environments reach a sufficient level of complexity, they may be controlled by a variety of relatively independent scripts. Keywords – Virtual Reality, Scripting

I. SCRIPTS FOR INTERACTIVE ENVIRONMENTS

Virtual environments that include a haptic, audio and visual interface are only interesting to the user if he or she experiences interesting events. In a passive environment, such as a play or a movie, interesting events can be scheduled according to a script. In a virtual environment, however, it is generally assumed that the user will have the ability to interact with the environment – navigate about it, manipulate objects in it, and communicate with other characters. Unless the environment is carefully managed, the user may never encounter interesting events.

Currently, most virtual environments do not include explicitly scripted events. Rather events occur naturally as a consequence of the way that the environment is programmed. Objects are provided that may or may not be moved; users are free to navigate wherever they desire; other characters may respond arbitrarily to communications.

For applications which immerse the user in a virtual environment for a specific purpose, however, the designer will want events in the environment to be driven by a script [1]. For example, a virtual environment cannot be used for education unless there is a guarantee that the student will encounter the appropriate pedagogical events [2]. Students will not learn how wolves take care of their cubs unless they enter the wolves’ virtual den. They will not learn to stop for school buses if they never drive their simulated car out of the virtual parking lot. The same logic applies to virtual environments used to sell products, or play games.

On the other hand, people do not like the feeling that they are being forced along a fixed path through a virtual environment. They revolt against educational environments that do not let them explore the topic from a variety of angles.

They object to games that have only a single solution. They are free to explore the real world at will and demand the same flexibility in their virtual environments. When sections of a virtual environment present material in conformance with a linear script, they rush through it, paying little attention until they reach a choice point.

II. TURN-BASED AND CONTINUOUS SCRIPTS

Scripts for interactive environments may be turn-based or continuous. In turn-based scripts, the environment presents a situation and then pauses while the user is given an opportunity to react. Even apparently real-time environments can be turn-based if there are no significant changes occurring in the environment during the pause. The appearance of a real-time environment can be maintained by activity that is insignificant to the script – birds flying past, leaves falling from trees, traffic on city streets. In this case, it makes no difference if the user takes a long time before acting. Alternatively, significant changes can be occurring continuously. Even if these changes occur outside the view of the user, the environment is no longer using a turn-based script because it is not waiting until the user takes his or her turn.

In practice, continuous environments can be described with a turn-based script if time is chunked into small units and one of the users’ options is to wait during a unit of time. Long pauses on the part of the user are merely considered to be repeated periods of waiting while the environment responds by changing further. This is useful because it is easier to write turn-based interactive scripts than interactive scripts for continuous action.

Thus, interactive scripts consist of sequences of environmental states, each followed by one or more possible user responses. Each user response is associated with a resultant environmental state, as shown in Table 1.

III. CONSTRUCTING SCRIPTS

The first step in constructing an interactive script is to develop the main plot line, also called the main line. We can think of the main line as the sequence of events that occur if the user does everything “right.” If the environment is a game, this is the sequence of moves that would win the game; if the environment has an educational purpose, this is the

143

Table 1: Beginning of an interactive script.

sequence of moves that would result in learning the material without error.

The first entry in the main line is the description of the initial environment. The remainder of the main line is a list of descriptions of environmental states, each followed by a single user response. At this point the script looks a lot like a screenplay, though it may have extensive stage directions and little or no dialogue.

The second step in constructing an interactive script is to develop alternate lines. For each environmental state in the main line, one considers each likely user response and either directs the user back to one of the states in the main line or writes alternate environmental states that may, in turn be developed into rich alternate plots.

Some of these alternate lines may terminate, for example, if the user’s character is killed in a game; some may cross connect back to the main line or to other alternate lines; and some may loop back on themselves.

IV. EMPIRICAL SCRIPT DEVELOPMENT

Authors who are writing an interactive script for the first time soon come to the realization that they are writing a lot more than any individual user will ever see – any path through the script will only traverse a portion of all of the environmental states. Game designers grow frustrated that much of their work may be wasted; educators fear that

students will only learn a fraction of what they could be learning.

The first reflex of most authors is to start limiting the users’ choices – force them back to the main line as soon as possible, terminate alternate branches before they grow too long, trap them in loops to make short scripts appear longer. All of these strategies reduce the author’s workload, but at the cost of boring and frustrating the user.

A better alternative is to launch into the empirical phase of script development as quickly as possible – as soon as enough of the interactive script has been written that it is useable; and long before it is complete.

Figure 1: Some of the structures found in interactive scripts

In the empirical phase, the script is installed in a working system of some kind and people are given an opportunity to interact with it. At this point, the script may consist of textual descriptions of the environment and the user’s responses as a text menu of choices. In that case, a simple program can be used to present the descriptions in sequence, depending on the users’ choices. If the user is presented with a menu of choices at every point, it is imperative that one of the choices be an open-ended response that gives the user some way to suggest responses that have not occurred to the author. This may be as simple as a blank space where the user can type whatever he or she wishes.

Authors with a more compulsive, technological bent may have developed their initial script in UML or some other obscure modeling language. They may prefer to use the actual haptic, audio, visual environment for the empirical

STATE 0 You are in a forest. There is a deer grazing here. It does not see you. You may:

1) Wait quietly. [STATE 1] 2) Approach the deer. [STATE 2] 3) Walk away. [STATE 3] 4) ____________________ [STATE 2]

STATE 1 You see a wolf emerge from the brush nearby. The wolf sniffs the air. The deer raises its head and freezes. You may:

1) Wait quietly. [STATE 4] 2) Approach the deer. [STATE 2] 3) Walk away. [STATE 3] 4) Approach the wolf. [STATE 5] 5) ________________ [STATE 2]

STATE 2 The deer bounds away into a thicket. A wolf that had been stalking the deer turns to look at you. You may:

1) Wait quietly. [STATE 5] 2) Walk away. [STATE 3] 3) Approach the wolf. [STATE 5] 4) __________________ [STATE 5]

STATE 3 . . .

144

phase of their script development. The advantage of this is that the users’ interactions will be faithful to the final result – users will actually see, hear, and feel what the author intends. The disadvantage is that development of the script will be much slower and more labour intensive than simply writing text.

In practice, it would be most efficient to combine the two approaches – use a text-based script to work out the high-level paths of the script in broad strokes and then implement the environment for a second empirical phase to work out the low-level details. To minimize costs, the second empirical phase may use a less detailed, more easily modified implementation than the final full implementation.

During the empirical phase, it is critical that user’s responses be collected and used to guide further script development. At this point, every user response must be addressed – the user is never wrong. If the user does something unexpected, then the script must be modified to accommodate it. If the user selects a response that does not produce an appropriate result, then either a new branch must be developed or the previous state description must be changed so that the user would not have chosen that response.

Though it sounds like this would create considerable additional work for the author, the actual reason for rushing to the empirical phase of script development is to minimize the author’s effort. The path that a user takes through an interactive script depends on the user’s choices. If no user ever makes certain choices, then there is no reason to develop an extensive alternate line beyond that point. It is only necessary to develop extensive alternate lines in answer to choices that are commonly made. Some years ago, the first author (Whalen) wrote a script to educate people about AIDS without sufficient early recourse to empirical development. When the script was finally presented to the target audience, the author was dismayed to find that ninety percent of the users only saw twenty percent of the states. Great sections of the script were never viewed, even by a single user. If eighty percent of the script had never written, almost no one would have noticed [3, 4]. The author wrote five times as much script as was necessary.

V. SCRIPT IMPLEMENTATION

When the script has been developed as far as possible in the empirical script phase, then it can be implemented in the full final haptic, audio, visual environment.

At this point, the environment should undergo its own period of empirical testing with a sample of the target users because people, when actually involved with the environment, may behave somewhat differently than they did in the previous less accurate simulations.

VI. COLLECTIONS OF SCRIPTS

An interactive script for a complex environment need not be a monolithic tome. It is more likely to be a collection of subscripts that are organized into an overall high-level script.

In particular, bots – characters in the environment that are controlled by computer programs – are likely each to have their own scripts. When a user encounters a bot, he or she may begin to interact with the script for the bot without being informed that he or she is no longer interacting with the environmental script. The script for the bot may have its own interface to the environment so that it can change the state of the environment, depending on its interactions with the user and possibly in ways that the user will not detect until some later time.

In the same way, other, ostensibly inanimate, objects may also be implemented as scripts that interact with the user and modify the environment.

Even the script for the physical environment may be broken into independent sections. What happens inside the wolves’ den may be a different script than what happens in the forest.

If subscripts are self-contained objects, they may be used in more than one environment.

VII. SPECIAL SCRIPT TYPES

Scripts for different parts of the environment and different objects in the environment may take different forms.

A. Scripts for Bots and Avatars

In particular, scripts for bots and avatars can be implemented as behavioural hierarchies [5] in which low level joint movements are organized into higher level skills, which, in turn, are organized into even higher level behaviours.

At the lowest level, the joint control level, a bot or avatar consists of a collection of segments that rotate about joints. A typical H-Anim [6] humanoid body encoded in VRML97, for example, consists of 47 joints and 15 body segments [7]. Each body segment can be independently rotated about the joints that connect it to other body segments.

Human bodies seldom move a single joint at a time. Even bending a single finger requires flexing at least two and usually three joints simultaneously. Thus, it is realistic to organize collections of joint movements into higher level skills, such as “take a step”, “grasp an object”, or “point straight ahead”. A skill consists of a number of joint movements, scheduled to occur in a carefully-timed sequence. While there is only one control for each joint, a bot/avatar is likely to have a large number of skills, each requiring a moderately complex sequence of joint movements.

At the skill control level, the bot or avatar must interact with the environment. If we consider a sequence of joint movements as a kind of script, then we can envision the bot/avatar scripts interacting with other scripts in the environment. The advantage of representing skills as interactive scripts is clear – interactive scripts are designed to take direction from elsewhere at every choice point. Thus the interaction between the bot/avatar’s skills and the

145

environment is easy to define. For example, a script for an object may consist of a simple wait loop with an exit to a movement branch when it is grasped. Conversely, the skill script may follow different branches depending on the feedback that it gets from other scripts in the environment.

At the highest control level, the behaviour control level, skill scripts are aggregated into behavioural scripts. At this level, as at the lower levels, scripts for bots and avatars do not differ. An avatar is a representation of a human being and must be controlled directly by a person. It would be impractical for a person to control an avatar joint-by-joint and inconvenient to have to specify every skill in the proper order and at the proper time. The only practical way for a person to control an avatar is by specifying high level behaviours and expecting the avatar’s scripts to know how to perform them. A person would prefer to direct their avatar to “walk across the room” than to have to repeatedly tell it, “step with your right foot,” “step with your left foot”, “step with your right foot,” “step with your left foot,” etc. Like avatars, bots, which are directly controlled by the computer rather than by a human operator, will have high level behaviours defined to simplify writing even higher level scripts for them.

The behavioural control level is an arbitrarily deep hierarchy of behaviours because a complex behaviour may be constructed from other, simpler behaviours or a mixture of simpler behaviours and skills. For example, a behaviour like “climbing a mountain” may be composed of other behaviours like walking, crawling, climbing, tying and untying knots, and even cooking lunch and sleeping overnight.

Unlike avatars, bots have another level of script above the three control levels. Bots require an interactive script to control their overall behaviour in lieu of direct human control. This interactive script is sometimes called the story level to distinguish it from the three levels of control scripts [5].

It is both possible and practical to develop the story level script for a bot before any scripts have been written for the control levels. Once the story level script nears completion, it can be used to determine what behaviours and skills the bot will require. It can even be used to determine what joints are required. If the story script never requires that the bot walk or sit – imagine a British Royal Guard standing in his booth – then there is no need to implement ankle or knee joints.

B. Scripts for Objects

As indicated above, there is value to representing even simple objects as interactive scripts, particularly during the early empirical development of the script.

Most objects of interest in a virtual environment have surprisingly complex behaviours. A simple can of soda pop can be grasped, moved, shaken, opened, drank, thrown, kicked, and crushed. It can be used as a door stop, for carrying oil, as a candle holder, or for any number of other functions.

Other scripts interact with an object and it has an impact on the other scripts in turn.

C. Scripts for Topography

As with objects, there is value to representing the topography of a virtual environment as a script.

A script, as a list of states (nodes) and connections between the states (arcs), is a network. Thus, describing a topography as a script has the same flexibility as describing the topography as a network. The immediate topography can be organized into directions and distances in a number of ways. In a given location, a bot that “turns north,” “turns right,” “turns away from the building,” or “turns to follow the dog,” may be executing the same action in every case. Conversely, a bot may not be able to execute the same action in different locations. For example, it may be able to climb down in one location but not in others.

The topographic script, however, can embody far more information than just describing the connections between locations. It can include changes in the topography that occur over time, such as changes in weather or the position of the sun above or below the horizon; or the effects of bot movements such as footprints, tire tracks, or broken vegetation.

D. Scripts for Conversations

The most interesting scripts may be those used to represent natural language conversations. It has been known for some time that a simple, but surprisingly acceptable approximation to a human conversation can be implemented as a simple network of answers that are connected to each other by potential user responses [8]. When the script responds to questions with the “correct” answer only 75 percent of the time, over 90 percent of the people using it report that they like the simulated conversation [4].

There are three reasons that people are so forgiving of simulated natural language conversations despite high error rates.

First, often people do not detect incorrect answers. If an answer contains information closely related to the topic of their question, they are likely to accept the answer without noticing that it does not contain the exact answer to the exact question that they asked.

Second, if people get an incorrect answer, they often will reword their question and ask again. If the second attempt produces the correct answer, they are satisfied and forgive the computer for failing the first time.

Third, none of the “incorrect” answers actually contains incorrect information. Rather the answer is only inappropriate because it failed to answer the user’s question. Often people type questions just to keep the conversation moving and do not care if they get an answer to the question that they asked or not. If the answer contained any interesting information and if the user was not particularly eager to have his or her question answered, he or she will be happy with the interaction.

Developing scripts that represent natural language follows the same process as developing any other interactive script:

146

write the main line; write obvious branches; rush to the empirical development stage as soon as feasible; and continue the empirical development until the desired level of accuracy is attained.

VIII. IMPLICIT SCRIPTS

Script development has been described as a linear process, that begins when the author writes the main plot line at the highest level and proceeds through to the actual implementation of a virtual environment. In actual practice, the scripts for objects and the topography are often developed in the opposite order.

Inanimate objects, for example, may often have a limited set of behaviours conferred upon them by the simulated physics of the environment. The programmers will write code to make the objects act appropriately without recourse to any explicit script. However, when an author is designing the content for the environment, he or she will still require that the object be represented in his or her script because it has an important effect on the rest of the script. The author may have to write a separate script for the object for no other reason than to allow the empirical development of the overall story script.

Or the author may simply include the object in the other scripts implicitly. Simply objects, especially, may be described in the other scripts; for example, the script may simply note that there is a boulder in the middle of a path. If the object changes over time or as a result of the user’s actions, then its script can be implicitly included in the other scripts only as long as the other scripts are linear. If the other script contains a loop that sends the user back to a state before the object changed, then the change will appear to have been magically reversed. Thus, parts of a script that contain loops cannot be used to hold an implicit script for an object that changes. If any part of the loop changes an object, then the object will have to have its own explicit script.

IX. THE NEED FOR INTERACTIVE SCRIPTS

While it is possible to develop virtual environments without resorting to an explicit script – and many such environments have been implemented – environments that are designed and tested in the form of a script before implementation can be more complex, more interesting, more internally consistent, and are likely to contain fewer errors and omissions.

As people use virtual environments for more demanding applications, such as elaborate educational simulations, interactive script development will be an indispensable first step in the design process. And, as virtual environments become more elaborate and more expensive to produce, reducing the number of errors by empirically testing scripts will be seen to be essential for controlling costs and meeting deadlines.

REFERENCES

[1] Garrand, T., Writing for Multimedia and the Web.Focal P:ress, 2000, 344 pp.

[2] Robertson, J. and Good J., Story Creation in Virtual Game Worlds. Communications of the ACM, 48(1), 61-65.

[3] Patrick, A. S., Jacques-Locmelis, W., and Whalen, T. E., The roll of previous questions and answers in natural-language dialogues with computers. International Journal of Human-Computer Interaction. 1993, 5(2), 129-145.

[4] Patrick, A. S., and Whalen, T. E., Field testing a natural-language information system: Usage characteristics and users' comments. Interacting with Computers. 1992, 4, 218-230.

[5] X. Yang, D.C. Petriu, T.E. Whalen, E.M. Petriu, "Hierarchical Animation Control of Avatars in 3D Virtual Environments," IEEE Transactions: Instrumentation and Measurement, 54(3), 1333-1341, 2005.

[6] ***, The Humanoid Animation Working Group, http://h-anim.org/ [7] ***, H-Anim Examples, http://www.ballreich.net/vrml/h-anim/h-anim-

examples.html [8] Whalen, T. E. and Patrick, A. S., COMODA: A conversational model

for data base access. Behaviour and Information Technology. 1990, 9(2), 93-110.

147

Documents

[IEEE IREE International Worksho on Haptic Audio Visual Environments and their Applications, 2005. - Ottawa, Canada (Oct. 1, 2005)] IREE International Worksho on Haptic Audio Visual