Building character animation for intelligent storytelling with the H-Anim standard Minhua Eunice Ma and Paul Mc Kevitt School of Computing and Intelligent

Building character animation for intelligent storytelling with the H-Anim standard

Minhua Eunice Ma and Paul Mc Kevitt

School of Computing and Intelligent Systems

Faculty of InformaticsUniversity of Ulster

EuroGraphics Ireland

29 April 2003

MultiModal interactive storytelling AesopWorld KidsRoom Larsen & Petersen’s Interactive

Storytelling Computer games

Virtual humans & embodied agents

Jack (University of Pennsylvania) Improv (Perlin & Goldberg, 1996) BEAT (Cassell et al., 2000) SimHuman Gandalf

Previous research


29 April 2003

Automatic Text-to-Graphics Systems WordsEye (Coyne & Sproat, 2001) ‘Micons’ and CD-based language animation

(Narayanan et al. 1995) Spoken Image (Ó Nualláin & Smith, 1994) &

successor SONAS (Kelleher et al. 2000) Semantic representations

Schank’s (1972) Conceptual Dependency (CD) Theory & scripts

Jackendoff’s (1990) Lexical Conceptual Structure (LCS)

Previous research


29 April 2003

Objectives of CONFUCIUS

To interpret natural language story and movie (drama) script input and to extract conceptual semantics from the natural language

To generate 3D animation and virtual worlds automatically from natural language

To integrate 3D animation with speech and non-speech audio, to form an intelligent multimedia storytelling system for presenting multimodal stories


29 April 2003

CONFUCIUS’ context diagram

Story in natural language

CONFUCIUSMovie/drama script 3D animation

non-speech audioTailored menu for script input

Speech (dialogue)Storywrit

er /playwrig

ht

User/story listene

r


29 April 2003

Architecture of CONFUCIUS

3D authoring tools, existing 3D

models & character models

visual knowledge (3D graphic library)

Prefabricated objects(knowledge base)

Script writer

Script parser

Natural Language Processing

Text To Speech

Sound effects

Animation generation

Synchronizing & fusion

3D world with audio in VRML

Natural language stories

Language knowledge

mapping

lexicongrammaretc

semantic representations

visual knowledge


29 April 2003

knowledge base

Language knowledge

Visual knowledge

World knowledge

Spatial & quantitative reasoning knowledge

Semantic knowledge - lexicons (e.g. WordNet)Syntactic knowledge - grammarsStatistical models of languageAssociations between words

Object model (nouns)

Functional informationInternal coordinate axes (for spatial reasoning)Associations between objects

Knowledge base of CONFUCIUS

Event model (event verbs, describes the motion of objects/humans)


29 April 2003

Software & Standards

Java: parsing intermediate representation, changing VRML code to add/modify animation, integrating modules

3D graphic modelling Authoring tools

Humanoid characters: Character Studio, Internet Character Animator (ICA)

Narrator: Microsoft Agent Props & stage: 3D Studio Max

Modelling language & standard VRML 97 for modelling the geometric of objects, props and

environment Humanoid modelling

MPEG-4 Face and Body Animation (FBA) Humanoid Animation (H-Anim) specifications Main problem to solve: defining standards for high-level behaviours of

virtual Humans

Natural language processing tools PC-PARSE (morphologic and syntax analysis) WordNet (lexicon, semantic inference)


29 April 2003

Level 1 Of Articulation of H-Anim

Joints and segments of LOA1

Though CONFUCIUS adopts Level 1 Of Articulation (LOA1) in its human character animation, its animation script engine adds ROUTEs dynamically based on the h-anim’s joint list and animation keyframe list. As long as the animation keyframes are in conformity with the joints definition in the h-anim file, CONFUCIUS’ animation engine is well adapted for any level of articulation.


29 April 2003

Agents and Avatars—How much autonomy?

Autonomy & intelligence: highlow

autonomous characters

avatars interface agentsVirtual humans:

Autonomous characters/agents have higher requirements for sensing, memory, reasoning, planning, behaviour control, and even emotional status (a sense-<emotion->control-action structure) Avatars are “user-controlled” and hence require fewer autonomous actions. However, basic naïve physics such as collision detection and reaction is still demanded when the user controls an avatar to hit a wall or grasp an object A virtual character in non-interactive storytelling is somewhere in between an agent and an avatar. Most of its behaviours, emotion, and responses to the changing environment are described in story input

characters in non-interactive storytelling


29 April 2003

Semantic representations

Categories Knowledge representations Decomposite Typical applications

rule-based representation

expert systems

FOPC (First Order Predicate Calculus)

sentence representation, expert systems

semantic networks

lexical semantics

Schank’s scripts

story understanding

frame-based representations

(1) general knowledge representation & reasoning

XML-based representations

multimodal semantics

Conceptual Dependency (CD)

event-logic truth conditions

x-schema and f-structure

Lexical-Conceptual Structure (LCS)

(2) physical knowledge representation & reasoning (inc. spatial /temporal reasoning)

Lexical Visual Semantic Representation (LVSR)

dynamic vision (movement) recognition & generation


29 April 2003

Lexical Visual Semantic Representation (LVSR) is a necessary semantic representation between 3D model information and syntactic information because 3D model differences, although crucial in distinguishing word meanings, are invisible to syntax

LVSR is based on Jackendoff’s LCS and adapts it to the task of language visualization. It enhances LCS by Schank’s scripts

Ontological categories of LVSR: OBJ, HUMAN, EVENT, STATE, PLACE, PATH, and PROPERTY

OBJ for props or places (e.g. buildings) HUMAN for either human being or any other articulated

animated characters (e.g. animals) as long as their skeleton hierarchy is defined in the graphic library

EVENT for actions, movements and manners STATE for static existence PROPERTY for attributes of OBJ/HUMAN

Lexical Visual Semantic Representation


29 April 2003

PATH & PLACE predicates

PATH predicates

Direction feature

Termination feature

PLACE predicates

contact/attach feature

to 1 1 at unmarked

from 0 1 behind <-contact>

toward 1 0 end_of n/a

away_from 0 0 in unmarked

via n/a 0 in_front_of <-contact>

across n/a n/a near <-contact>

along n/a n/a on <+contact>

out unmarked

over <-contact>

top_of n/a

under unmarked

We analysed 62 common English prepositions and defined 7 PATH predicates and 11 PLACE predicates for interpreting spatial movement events of OBJ/HUMANs


29 April 2003

Examples of LVSR & animation generation

Manipulating environment & spatial relationsInput sentence: John walked towards the house.LVSR: [EVENT walk ([HUMAN john],[PATH toward [OBJ house]])]Output animation

Input sentence: Nancy ran across the field.LVSR:[EVENT run ([HUMAN nancy],[PATH via [PLACE on [OBJ field]]])]Output animation

Manipulating objectsInput sentence: John lifted his hat.LVSR: [EVENT go ([OBJ hat],[PATH from [PLACE on [OBJ john.head]]])][EVENT lift ([HUMAN john],[OBJ hat])]Output animation


29 April 2003

Graphics library

Simple geometry filesgeometry & joint hierarchy

Files (H-Anim)

animation library(key frames)

objects/props characters

motions

instantiation


29 April 2003

Animation generator

verbsemantic analysis

use lexical entries in Lexical Visual Semantics to analyse verb semantics, replace synonyms, spatial reasoning

match basic motionsin library?

motiondecomposition

animation controller

environmentplacement

N

Y

Syntax tree

VRML file of the virtual story world

motion instantiation

apply scripts

LVSR

If the event predicate matches basic human motions in animation library

Apply spatial info & place OBJ/HUMAN into a specified environment


29 April 2003

Collision detection

Collision detection is a crucial issue for path planning, manoeuvring objects, reactive behaviour, and multiple characters’ activities

VRML provides a built-in collision detection mechanism for the avatar (user), but the mechanism does not apply to intersection between other characters/objects

Collision avoidance algorithms for humanoid bodies: Coarse approximations (e.g. bounding boxes or spheres) Polygon level checks between humans and objects Dynamic LOD checking according to distance to the

observer, users’ observation focus, and whether the human is in a crowd, etc.

CONFUCIUS’ animation generator uses bounding cylinders around the human body segments for

protagonists A bounding cylinder around the whole human body for

minor characters, characters in a crowd, and characters beyond the scope of attention


29 April 2003

Multiple characters’ synchronization & coordination

Multiple characters’ activities A character can start a task when another signals that the

situation (pre-conditions) is ready Characters can communicate with one another Two or more characters can cooperate in a shared taskMultiple characters’ synchronization Event-driven timing mechanism (VRML provides a utility for

event routing (ROUTE node) Exact time-driven synchronization

Nancy was walking along the street. John called her. Nancy stopped and saw John. John walked towards her. They exchanged greetings.

The end of the animation john_speech (calling Nancy) triggers:(1) to stop the animation of nancy_walk(2) to start the animation of nancy_gazeWander (searching for who’s calling)(3) to start the animation of john_walk (walking towards Nancy)


29 April 2003

Relation to other work

A general purpose humanoid character animation system

Compared with other related virtual human modelling systems, CONFUCIUS’ character animation focuses on the language-to-humanoid animation process rather than considering human modelling & motion solely

Fully use existing 3D OBJ/HUMAN models, tools and programs, such as the H-anim models Nancy (by C. Ballreich, © 1997 3Name3D / Yglesias, Wallock, Divekar, Inc.) and Baxter (by C. Babski, © LIG/EPFL), animation keyframe files, and BVH to h-anim keyframe conversion script (by M. Lewis, The Ohio State University)

Adopt current studies in linguistics such as LCS and improve them to adapt the demands of language visualization


29 April 2003

Prospective applications

Children’s education Multimedia presentation Movie/drama

production Computer games Virtual Reality

Conclusion & future work

CONFUCIUS’ humanoid character animation explores challenging problems in language visualization and automatic animation production:

formalizes meaning of action verbs and spatial prepositions

maps language primitives with visual primitives a reusable common senses knowledge base for other

systems Future work Deformation for facial

expressions under-specified language

input action composition for

simultaneous activities

Documents

Building character animation for intelligent storytelling with the H-Anim standard Minhua Eunice Ma and Paul Mc Kevitt School of Computing and Intelligent