9
Key-styling: learning motion style for real-time synthesis of 3D animation y By Yi Wang * , Zhi-Qiang Liu and Li-Zhu Zhou ********************************************************************************************* In this paper, we present a novel real-time motion synthesis approach that can generate 3D character animation with required style. The effectiveness of our approach comes from learning captured 3D human motion as a self-organizing mixture network (SOMN); of parametric Gaussians.The learned model describes the motion under the control of a vector variable called style variable, and acts as a probabilistic mapping from the low-dimensional style values to the high-dimensional 3D poses. We design a pose synthesis algorithm to allow the user to generate poses by specifying new style values. We also propose a novel motion synthesis method, the key-styling, which accepts a sparse sequence of key style values and interpolates a dense sequence of style values to synthesize an animation. Key-styling is able to produce animations that are more realistic and natural-looking than those synthesized with the traditional key-keyframing technique. Copyright # 2006 John Wiley & Sons, Ltd. Received: 10 April 2006; Revised: 2 May 2006; Accepted: 10 May 2006 KEY WORDS: motion synthesis; motion capture data; neural networks; self-organizing mixture network Introduction Generally, 3D motion synthesis approaches can be categorized into three types: (1) key-framing, which requires intensive manual labor of artists to create a sequence of keyframes for each animation; (2) inverse dynamics/kinematics and physical simulation, which requires in-depth human knowledge about specific types of motions; (3) example-based synthesis, which is able to synthesize realistic animations by learning from motion capture data. Compared with the former two types, the example-based synthesis requires neither in- depth human knowledge nor intensive manual labor. By designing automatic learning and convenient synthesis methods, we would be able to maximize the effective- ness of example-based synthesis. In this paper, we present a novel and efficient example-based motion synthesis approach called key- styling. By extracting and representing characteristics of the training motion as a vector variable called style variable, we learn a novel neural network, the self- organizing mixture network (SOMN), 1 from motion capture data as a probabilistic mapping from values of the style variable to 3D poses. Because a 3D pose is defined by the rotations of major body joints, it has to be represented by a high-dimensional vector. Whereas the characteristics of a certain kind of motion can usually be encoded by a few parameters, which constitute a much lower-dimensional style variable. So the learned prob- abilistic mapping allows the user to manipulate the low- dimensional style values instead of the complex high- dimensional 3D poses. Because the learning algorithm is supervised, it en- sures each dimension of the style variable has an explicit physical meaning. For example, the style variable of box- ing motion may have two dimensions, where one en- codes the change of body height for evading from being attacked, and the other encodes the stretch of arm for punching. Therefore, the user can give a few parameters to precisely define the style of a pose. This makes it easy and intuitive for the user to create a sparse sequence of key- style values. The physical meaning of each dimensions of COMPUTER ANIMATION AND VIRTUAL WORLDS Comp. Anim. Virtual Worlds 2006; 17: 229–237 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cav.126 ******************************************************************************************************************* *Correspondence to: Yi Wang, Department Computer Science (Graduate School at Shenzhen), Tsinghua University at Shenzhen, Guang-Dong Province, 618055, China. E-mail: [email protected] y Yi Wang is a Ph.D. Student, Zhi-Qiang Liu and Li-Zhu Zhou are professors. Contract/grant sponsor: Hong Kong RGC; contract/grant number: 1062/02E, CityU 1247/03E. Contract/grant sponsor: Natural Science Foundation of China; contract/grant number: 60573061. ******************************************************************************************************************* Copyright # 2006 John Wiley & Sons, Ltd.

Key-styling: learning motion style for real-time synthesis of 3D animation

  • Upload
    yi-wang

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Key-styling: learning motion style for real-time synthesis of 3D animation

COMPUTER ANIMATION AND VIRTUAL WORLDS

Comp. Anim. Virtual Worlds 2006; 17: 229–237

Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cav.126* * * * * * * * * * * * * * * * * * * * * * *

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Key-styling: learning motion stylefor real-time synthesis of 3D animationy

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

By Yi Wang*, Zhi-Qiang Liu and Li-Zhu Zhou* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

In this paper, we present a novel real-time motion synthesis approach that can generate 3D

character animationwith required style. The effectiveness of our approach comes from learning

captured 3D human motion as a self-organizing mixture network (SOMN); of parametric

Gaussians.The learnedmodel describes themotion under the control of a vector variable called

style variable, and acts as a probabilistic mapping from the low-dimensional style values to

the high-dimensional 3D poses. We design a pose synthesis algorithm to allow the user to

generate poses by specifying new style values. We also propose a novel motion synthesis

method, the key-styling, which accepts a sparse sequence of key style values and interpolates

a dense sequence of style values to synthesize an animation. Key-styling is able to produce

animations that are more realistic and natural-looking than those synthesized with the

traditional key-keyframing technique. Copyright # 2006 John Wiley & Sons, Ltd.

Received: 10 April 2006; Revised: 2 May 2006; Accepted: 10 May 2006

KEY WORDS: motion synthesis; motion capture data; neural networks; self-organizingmixture network

Introduction

Generally, 3D motion synthesis approaches can be

categorized into three types: (1) key-framing, which

requires intensive manual labor of artists to create a

sequence of keyframes for each animation; (2) inverse

dynamics/kinematics and physical simulation, which

requires in-depth human knowledge about specific

types of motions; (3) example-based synthesis, which is

able to synthesize realistic animations by learning from

motion capture data. Compared with the former two

types, the example-based synthesis requires neither in-

depth human knowledge nor intensivemanual labor. By

designing automatic learning and convenient synthesis

methods, we would be able to maximize the effective-

ness of example-based synthesis.

*Correspondence to: Yi Wang, Department Computer Science(Graduate School at Shenzhen), Tsinghua University atShenzhen, Guang-Dong Province, 618055, China.E-mail: [email protected] Wang is a Ph.D. Student, Zhi-Qiang Liu and Li-Zhu Zhouare professors.

Contract/grant sponsor: Hong Kong RGC; contract/grantnumber: 1062/02E, CityU 1247/03E.Contract/grant sponsor: Natural Science Foundation of China;contract/grant number: 60573061.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd.

In this paper, we present a novel and efficient

example-based motion synthesis approach called key-

styling. By extracting and representing characteristics of

the training motion as a vector variable called style

variable, we learn a novel neural network, the self-

organizing mixture network (SOMN),1 from motion

capture data as a probabilistic mapping from values

of the style variable to 3D poses. Because a 3D pose is

defined by the rotations of major body joints, it has to be

represented by a high-dimensional vector. Whereas the

characteristics of a certain kind of motion can usually be

encoded by a few parameters, which constitute a much

lower-dimensional style variable. So the learned prob-

abilistic mapping allows the user to manipulate the low-

dimensional style values instead of the complex high-

dimensional 3D poses.

Because the learning algorithm is supervised, it en-

sures each dimension of the style variable has an explicit

physicalmeaning. For example, the style variable of box-

ing motion may have two dimensions, where one en-

codes the change of body height for evading from being

attacked, and the other encodes the stretch of arm for

punching. Therefore, the user can give a few parameters

to precisely define the style of a pose. This makes it easy

andintuitivefor theusertocreateasparsesequenceofkey-

style values. The physical meaning of each dimensions of

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Page 2: Key-styling: learning motion style for real-time synthesis of 3D animation

Y. WANG, Z.-Q. LIU AND L.-Z. ZHOU* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

the style variable also makes it reasonable to interpolate

the key-style sequence to generate a dense sequence of

style values,which could bemapped to a realisticmotion

by the learned probabilistic mapping.

We also develop a synthesis program, which allows

the user to drag the mouse over a screen area

representing the style space, which is an Euclidean space

spanned by all possible style values. Because every

mouse position corresponds to a style value, the

trajectory of mouse movement could be mapped to a

character animation in real-time.

The rest of this paper is organized as follows. In

Section 2, we explain our motivations by comparing

various aspects of our work with previous researches. In

Section 3, we explain the SOMN of parametric Gaussian

model and derive its learning algorithm. Section 4 is

about the synthesis of both static poses and dynamic

motions. In Section 5, we show the usability and

effectiveness of our prototype system by learning and

synthesizing boxing motions.

RelatedWorks

Bayesian Learning for Example-BasedMotion Synthesis

Because 3D character motions are so complex, it is

difficult to capture all the details with a deterministic

model. In recent years, some top researches in the

example-based motion synthesis area began to use the

Bayesian learning framework to model the randomness

feature of 3D motions. Some typical works are

References [2–4], as listed in Table 1. In Reference [2],

Li et al. used an unsupervised learning approach to

learning possible recombinations of short motion

segments as a segment hidden Markov model.5,6 In Refer-

ence [3], Grochow et al. used a non-linear principle

component analysis method called Gaussian process

latent variable model,7 to project 3D pose into a low-

dimensional space, where each point in the space could

be projected back to a 3D pose. In contrast with

Reference [2], the learning approach proposed in

Reference [3] is unsupervised, and the subject to be

Learning (dynam

Supervised approach Brand (1Unsupervised approach Li (2002);2 Bra

Table I. The placement

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 230

modeled is static poses other than dynamic motions. In

Reference [4], Brand and Hertzmann proposed to learn

the humanmotion under control of a style variable as an

improved parametric hidden Markov model8 with an

unsupervised learning algorithm.9

In this paper, we present a supervised learning

algorithm to model the 3D poses under the control of

the style variable. Compared with the works in the first

columnofTable1 thatmodel andsynthesizemotions, our

method has the flexibility to synthesize both static poses

anddynamicmotions. Comparedwith the unsupervised

pose synthesis methods as listed in the second row of

Table 1, our supervisedmethod ensures that eachdimen-

sion of the style variable has explicit physical meaning,

which allows the users to give precise style values to

express their requirements of the synthesis results.

Modeling Motion Style SeparatelyfromMotion Data

The idea of extracting motion styles and modeling them

separately from the motion data was originally pro-

posed in the research area of gesture recognition and

waswell developed in Reference [8]. In Reference [4], the

idea of separating style from data is used to develop a

productive motion synthesis approaches that allows the

users manipulate low-dimensional style values instead

of the high-dimensional motion data. Because each

frame of the motion data corresponds to a 3D pose,

which is defined by the 3D rotations of dozens of major

body joints, it has to be represented by a very high

dimensional vector, usually over 60-dimensions.2 The

high dimensionality makes it difficult to model and

manipulate the motion data. On the contrary, the style

variable is usually a low-dimensional vector (1 � 3-

dimensional in our experiments) that encodes a few

important aspects of the motion. To support motion

synthesis by manipulating in the low-dimensional style

space and to keep maximum flexibility of synthesis, we

learn a probabilistic mapping from style values to

human poses, instead of motion sequences as in

Reference [4], as a conditional probabilistic distribution

(p.d.f.) Pðx j uÞ, where u is the style variable and x

represents 3D pose.

ic) motions Learning (static) poses

999)9 This papernd (2000)4 Grochow (2004)3

of our contribution

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Comp. Anim. Virtual Worlds 2006; 17: 229–237

Page 3: Key-styling: learning motion style for real-time synthesis of 3D animation

1The Kullback-Leibler is a generalized form of the likelihood.The EM algorithm learns amodel bymaximizing the likelihood.

MOTION STYLE FOR REAL-TIME SYNTHESIS OF 3D ANIMATION* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

A well-known model that represents a conditional

distribution is the parametric Gaussian, whose mean

vector is a function f (u). However, in order to capture the

complex distribution of 3D poses caused by the complex

dynamics of human motion, we model Pðx j uÞ as a

mixture of parametric Gaussians. Although most mix-

ture models are learned by the Expectation-Maximiza-

tion (EM) algorithm, we derived an learning algorithm

based on the (SOMN), which is in fact a stochastic

approximation algorithm. Compared with the EM

algorithm, which is a deterministic ascent algorithm,10

SOMN achieves faster convergence rate and is less

probable to be trapped in local optima.1

Learning SOMNofParametric Gaussians

The SOMNof ParametricGaussians Model

Mixture models is able to capture complex distributions

over a set of observables X ¼ fx1; . . . ; xNg. Denote L as

the set of parameters of the model, the likelihood of an

mixture model is,

pðx jLÞ ¼XKj¼1

ajpjðx jljÞ (1)

where each pj(x) is a component of the mixture model, ajis the corresponding weight of the component, and ljdenotes the parameters of the j-th component.

Given the observables, X ¼ fx1; . . . ; xNg, learning a

mixture model is in fact an adaptive clustering process,

where some of the observables are used to estimate a

component, whereas some other observables are used to

estimate other components. A traditional approach to

learning a mixture model is the EM algorithm, which, as

a generalization of the K-means clustering algorithm,

alternatively executes an E-step and anM-step, where in

the E-step each observable xi is assigned to a component

pj to the extent lij; and in the M-step each pj is estimated

from those observables xi with lij > 0. It has been proven

in Reference [11] that this iteration process is actually a

deterministic ascent optimization algorithm.

The SOMNproposed by Yin and Allinson in 20011 is a

neural network that has similar properties as the well-

known clustering algorithm, the self-organizing map

(SOM), but in the SOMN, each node represents a

component of a mixture model. The major difference

between the learning algorithm of SOMN and the EM

algorithm is that the former uses the Robbins-Monro

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 231

stochastic approximation method to estimate the mix-

ture model to achieve generally faster convergence rate

and less probability of being trapped by local optima.

In this paper, we derive a specific SOMN, which,

unlike the traditional SOMN that models a joint dis-

tribution p(x), models a conditional probability distri-

bution pðx j uÞ as a mixture model of,

p x j u;Lð Þ ¼XKi¼1

aipj x j u;lið Þ (2)

where, each component pj(�) is a linear parametric

Gaussian distribution,

pj x j u;lj

� �¼ N x;W ju þ mj;Sj

� �(3)

where W j is called the style transformation matrix that

together with mj and Sj form the parameter set lj of the

j-th component.

The Learning Algorithm

In SOMN, learning of parametric Gaussians minimizes

the following Kullback–Leibler divergence1 between the

true distribution pðx j u;LÞ and the estimated one

pðx j u;LÞ,

D p; pð Þ ¼ �Z

logpðx j u;LÞpðx j u;LÞ pðx j u;LÞdx (4)

which is always a positive number and will be zero if

and only if the estimated distribution is the same as the

true one. When the estimated distribution is modeled as

amixturemodel, taking partial derivatives of Equation 4

with respect to li and ai leads to

@

@liD p; pð Þ¼ �

Z1

p x u; L���� � @p x u; L

���� �@li

264

375p xð Þdx

@

@aiD p; pð Þ¼ �

Z1

p x u; L���� � @p x u; L

���� �@ai

264

375p xð Þdx

þ j@

@ai

XKj¼ 1

ai � 1

24

35

¼ � 1

ai

Zaipi x u; li

��� �pi x u; L

���� � � jai

264

375p xð Þdx

(5)

where j is a Lagrange multiplier to ensureP

i ai ¼ 1.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Comp. Anim. Virtual Worlds 2006; 17: 229–237

Page 4: Key-styling: learning motion style for real-time synthesis of 3D animation

Y. WANG, Z.-Q. LIU AND L.-Z. ZHOU* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

As in Reference [1], we choose the Robbins–Monro

stochastic approximation to solve Equation 5 because

the true distribution is unknown and the equation

depends only on the estimated version. We obtain the

following set of iterative updating rules:

liðtþ 1Þ ¼ liðtÞ þ dðtÞ 1

pðx j u; LÞ@pðx j u; LÞ

@liðtÞ

" #

¼ liðtÞ þ dðtÞ ai

pðx j u; LÞ@pðx j u; liÞ

@liðtÞ

" #(6)

aiðtþ 1Þ ¼ aiðtÞ þ dðtÞ aiðtÞpðx j u; liÞpðx j u; LÞ

� aiðtÞ" #

¼ aiðtÞ � dðtÞ pði j x; uÞ � aiðtÞ½ � (7)

where d(t) is the learning rate at time step t, and pðx j u;LÞis the estimated likelihood, pðx j u;LÞ ’

Pi aipðx j u;liÞ.

The detailed derivation of Equation 5, 6, and 7 are

similar to the derivations in Reference [1].

To derive the partial derivative of the component

distribution in Equation 6, we denote Zi ¼ ½W i;mi� andV ¼ u; 1½ �T , so that pðx j; u; liÞ ¼ N ðx;W iu þ mi;SiÞ ¼Nðx;ZiV;SiÞ. Then, by taking derivative of Equation

6 and solving the following equation,

Ziðtþ 1Þ ¼ ZiðtÞ þ dðtÞ ai

pðx j LÞ@Nðx;ZiV;SiÞ

@ZiðtÞ

" #

¼ ZiðtÞ þ dðtÞ ai

pðx j LÞN ðx;ZiV;SiÞ

@ logNðx;ZiV;SiÞ@ZiðtÞ

" #

¼ ZiðtÞ þ dðtÞ pði j xÞ @ logNðx;ZiV;SiÞ@ZiðtÞ

¼ ZiðtÞ �1

2dðtÞpði j xÞ @

@Ziðx� ZiVÞTS�1ðx� ZiVÞ

¼ ZiðtÞ �1

2dðtÞpði j xÞ @

@ZiðxTS�1x� ZiV

TS�1x� xTS�1ZiVþ ZiV

TS�1ZiVÞ

¼ ZiðtÞ �1

2dðtÞpði j xÞ @

@ZiðxTS�1x� 2ZiV

TS�1xþ ZiV

TS�1ZiVÞ

¼ ZiðtÞ �1

2dðtÞpði j xÞ �2 @

@ZiðZiVÞTS�1xþ

@

@ZiðZiVÞTS�1ZiV

¼ ZiðtÞ �1

2dðtÞpði j xÞ �2 @

@ZiVTZT

i S�1xþ @

@ZiðVTZT

i S�1ÞðZiVÞ

¼ ZiðtÞ �1

2dðtÞpði j xÞS�1 xVT � ZVVT

� �

we arrived the updating rule for Zi:

DZi ¼ �1

2dðtÞpði j xÞS�1 xVT � ZVVT� �

(8)

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 232

By considering pði j x; uÞ as the Gaussian neighborhood

function,12 we can consider Equation 8 exactly as the

SOM updating algorithm. Although an updating rule

for DSi may be derived similarly, it is unnecessary in

the learning algorithm, because the covariance of

each component distribution implicitly corresponds

to the neighborhood function pði j xÞ, or the spread

range of updating a winner at each iteration. As

the neighborhood function has the same form for

every node, the learned mixture distribution is

homoscedastic.

SOMNof ParametricGaussians forMotion Synthesis

Determine the Physical Meaning of theStyleVariable

A learned SOMN of parametric Gaussian model

pðx j u;LÞmay be considered as a probabilistic mapping

from a given style value u to a 3D pose x. If the user

knows the physical meaning of each dimension of the

style variable u, he can give a precise style value u to

specify the characteristics of the synthesized pose. The

supervised learning framework presented in Section 3

allows the user to determine the physical meaning of the

style variable prior to learning.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Comp. Anim. Virtual Worlds 2006; 17: 229–237

Page 5: Key-styling: learning motion style for real-time synthesis of 3D animation

MOTION STYLE FOR REAL-TIME SYNTHESIS OF 3D ANIMATION* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

As an example, suppose that we captured boxing

motion as training data, where the boxer sometimes

crouches to evade from being attacked and some other

times punches his fist to attack. We can use a 2-

dimensional style variable to describe the details of the

boxing motion, where one dimension encodes the body

height, which varies from crouching to standing up, and

the other encodes the arm length when punching. Once

the physical meaning of each dimension of the style

variable is determined, the style values l ¼ fl1; . . . ;lNgof each of the training frames x ¼ fx1; . . . ; xNg can be

calculated from the training motion itself.

It is notable that if we carefully choose a number of

dimensions of the style variable that encode visually

independent characteristics of the training motion, the

style space that is spanned by all possible style values, will

Figure 1. A prototype system for training the SOMN model fro

convenient for the user to browse the training 3D motion data, to

SOMN model, and to de

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 233

be an Euclidean space,withinwhich, a curve corresponds

to smooth change of the style value. This is interesting for

synthesizing character animations, instead of static poses,

because the smooth change of motion style like body

height and punch distance usually leads to smooth body

movement. Experiments are shown in Section 5.

Generate 3DPose fromGivenStyleValue

Given a learned SOMN of parametric Gaussians

pðx j u;LÞ with K components, mapping a given style

value u to a 3D pose x can be achieved by substitute u into

the model and draw a sample x from the distribution

pðx j u;LÞ. Although the Monte Carlo sampling method

m motion capture data. The graphical user interface makes it

discover stylistic characteristics, to specify parameters of the

signate output files.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Comp. Anim. Virtual Worlds 2006; 17: 229–237

Page 6: Key-styling: learning motion style for real-time synthesis of 3D animation

Figure 2. Two screen shots of the prototype system for real-time motion synthesis. The graphical user interface is adaptable to the

change of dimensionality of the style variable. For training motions with style-dimensionality � 3, the synthesis system creates a

style panel including a set of slider-widgets to represent every dimensions of the style variable (as shown in the left). The user can

drag the sliders to adjust the style values; For style-dimensionality< 3, users may choose to represent the style space with a screen

area (as shown in the right), with the grey cross-sign denote currently select style values. If the user drags the cross-sign with

mouse cursor, the system generates a motion sequences with the style of poses changes as the dragging curve.

input: The given new style u

output: The synthesized pose x

calculate the most porbable pose from each component;

foeach j 2 ½1;K� doj xj W ju þ mj;

end

select the most probable one among the calculation

result;

j argmaxj aj pJðxjju;LÞ;x xj;

Algorithm 1. synthesize pose from given style value

Y. WANG, Z.-Q. LIU AND L.-Z. ZHOU* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

is generally applicable for most complex distributions,

to avoid the intensive computation and achieve real-

time performance, we designed the following two-step

algorithm as shown in Algorithm 1 to calculate the pose

x with the highest probability. The first step of the

algorithm calculates the poses fxjgKj¼1 that are most

probable for each component pj of the learned model;

and then the algorithm selects and returns the most

probable one x among all the fxjgKj¼1.

Synthesiswith theKey-Styling Approach

We developed an interactive graphical user interface

(GUI) system as shown in Figure 2 to ease the pose and

motion synthesis. With the parameter adjustment panel

(to the left of the main window), the user is able to

specify a style value by adjusting every dimension of the

style variable. The changed style value is instantly input

to Algorithm 1, and the synthesized pose x is displayed

in real-time.

With this GUI program, users can also create

animations by (1) select a sparse sequence of key-styles

to define the basic movement of a motion segment, (2)

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 234

produce a dense sequence of style values interpolating

the key-styles, and (3) map each style value into a frame

to synthesize the motion sequence. As the traditional

method of producing character animations is called

keyframing, which interpolate a sparse sequence of

keyframes, we fittingly name our method key-styling.

A known problem of keyframing is that the synthes-

ized animation appears rigid and robotic. This is

because the keyframes is represented by a high-

dimensional vector consisting of 3D joint rotations.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Comp. Anim. Virtual Worlds 2006; 17: 229–237

Page 7: Key-styling: learning motion style for real-time synthesis of 3D animation

MOTION STYLE FOR REAL-TIME SYNTHESIS OF 3D ANIMATION* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Evenly interpolating the rotations cannot ensure evenly

interpolated dynamics. On the contrary, interpolating

the key-styles results in smooth changes of the major

dynamics and style-to-pose mapping adds kinematics

details to the motion. So the change of kinematics details

does not need to be evenly.

Experiments

To demonstrate the usability our synthesis approach, we

captured a segment of boxing motion of about 3 minutes

under the frame-rate of 66 frame-per-second as the

training data. Some typical poses in the motion is shown

in Figure 3 (a), (b), and (c). Because the boxer sometimes

crouches to evade and some other times punches out, we

Figure 3. (a)�(c): Some typical poses in our training boxingmotion

(b) a large body height value and small punch distance, (c) large b

segment of synthesis motion — punching while crouching, which is

distance. The starting pose is similar to the one shown in (a), whi

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 235

use a 2-dimensional style variable to encode the body

height and the distance of punching.

Once the dimensionality of the style variable is

determined, labeling the training frames with style

values is not a difficult problem. For the application of

automatic motion synthesis, we must have the skeleton

(the connections of joints) for rendering the synthesized

motion and must have the rotations of joints as training

data. With these two kinds of information, it is easy to

compute the style value ui for each training frame xi. ‘We

wrote a motion browser program, as shown in Figure 1,

to help the user browsing the captured motions,

discovering motion styles, and writing embedded Perl

scripts to derive style values from motion data.’

After estimating an SOMN of parametric Gaussians

from the labeled training frames, we can give new style

value by dragging the slide bars of our prototypemotion

, where, (a) a small body height value and small punch distance,

ody height value and small large punch distance. (d) A short

generated by simply dragging the slide bar to change the punch

le the ending pose has never appeared in the training motion.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Comp. Anim. Virtual Worlds 2006; 17: 229–237

Page 8: Key-styling: learning motion style for real-time synthesis of 3D animation

Y. WANG, Z.-Q. LIU AND L.-Z. ZHOU* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

synthesis system (as shown in Figure 2). The short

animation shown in Figure 3(d) is generated by

dragging the slide bar that represents the punch distance

synthesizes.

The accompanied demo video shows real-time syn-

thesis of boxing animation by dragging mouse over a

screen area representing the style space of boxing motion.

Conclusion and Discussion

In this paper, we presented a novel approach to

synthesizing 3D character animations automatically

and conveniently. The first step of our approach is to

learn a probabilistic mapping from a low-dimensional

style variable to high-dimensional 3D poses. By model-

ing the probabilistic mapping with an SOMN of

parametric Gaussians, we designed a learning algorithm

which is less prone to being trapped in local optima and

converges faster than previous EM-based algorithms for

learning mixture models. The supervised learning

framework gives the user the flexibility to specify the

physical meaning of each dimension of the style

variable. As a result, given a learned model and using

our prototype motion synthesis system, the user is able

to create 3D poses by simply dragging slide-bar widgets

and/or to produce character animations by our key-

styling technique.

ACKNOWLEDGEMENTS

We sincerely appreciate Dr. Hu-Jun Yin of University of Man-

chester for his fruitful suggestions and detailed explanation on

the SOMNmodel. We sincerely appreciate Professor Wen-Ping

Wang of Hong Kong University for the constructive discussion

about the motion synthesis approach. We gratefully acknowl-

edge Microsoft Research Asia for providing the motion capture

data. This work is supported in part byHongKong RGCProject

No. 1062/02E, CityU 1247/03E, and Natural Science Founda-

tion of China No. 60573061.

References1. Yin H-J, Allinson NM. Self-organizing mixture networks

for probability density estimation. IEEE Transactions onNeural Networks 2001; 12(2): 405–411.

2. Li Y, Wang T-S, Shum H-Y. Motion texture: a two-levelstatistical model for character motion synthesis. Proceedingsof ACM SIGGRAPH, pages 465–472, 2002.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 236

3. Grochow K, Martin SL, Hertzmann A, Popovic Z. Style-based inverse kinematics. In Proceedings of ACMSIGGRAPH, 2004; pp. 522–531.

4. Brand M, Hertzmann A. Style machines. In Proceedings ofACM SIGGRAPH, 2000; pp. 183–192.

5. Ostendorf M, Digalakis VV, Kimball OA. From HMM’s tosegment models: a unified view of stochastic modeling forspeech recognition. IEEE Transactions on Speech and AudioProcessing 1996; 4(5): 360–378.

6. Gales MJF, Young SJ. The theory of segmental hiddenMarkov models. Technical report, Cambridge Univ. Eng.Dept., 1993.

7. Lawrence ND. Gaussian process latent variable models forvisualisation of high dimensional data. In Proc. 16th NIPS,2004.

8. Wilson AD, Bobick A. Parametric hidden markov modelsfor gesture recognition. IEEE Transactions on PatternAnalysis Machine Intelligence 1999; 21(9): 884–900.

9. Brand M. Pattern discovery via entropy minimization. InArtificial Intelligence and Statistics, vol. 7. Heckerman D,Whittaker C (eds). Morgan Kaufmann, Los Altos, 1999.

10. Yin H-J, Allinson NM. Comparison of a bayesian somwith the em algorithm for gaussian mixtures. In Proceed-ings of Workshop on Self-Organizing Maps, 1997; pp. 304–305.

11. Ormoneit D, Tresp V. Averaging, maximum penalisedlikelihood and bayesian estimation for improving gaussianmixture probability density estimates. IEEE Transactions onNeural Networks 1998; 9: 639–650.

12. Teuvo Kohonen. Self-Organizing Maps. Springer, Berlin,2001.

Authors’ biographies:

YiWang received the B.Sc. degreewith first-class honorsin computer science and engineering from the ChangshaInstitute of Technology, China, and is currently a Ph.D.student at the Tsinghua University, Beijing, China. Hehas been working at Microsoft Research Asia on photo-realistic rendering as a visiting student, and working atthe City University of Hong Kong on the theory ofBayesian learning. His major interests are designingprogramming languages and writing parsers/compi-lers/interepters for fun.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Comp. Anim. Virtual Worlds 2006; 17: 229–237

Page 9: Key-styling: learning motion style for real-time synthesis of 3D animation

MOTION STYLE FOR REAL-TIME SYNTHESIS OF 3D ANIMATION* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Zhi-Qiang Liu (S

!-82-M

!-86-SM

!-91) received theM.A.Sc. degree in Aerospace Engineering from theInstitute for Aerospace Studies, The University ofToronto, and the Ph.D. degree in Electrical Engineeringfrom The University of Alberta, Canada. He is currentlywith School of Creative Media, City University ofHong Kong. He has taught computer architecture, com-puter networks, artificial intelligence, programminglanguages, machine learning, pattern recognition, com-puter graphics, and art & technology. His interests arescuba diving, neural-fuzzy systems, painting, garden-ing, machine learning, mountain/beach trekking, smartmedia systems, computer vision, serving the communityand fishing.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Copyright # 2006 John Wiley & Sons, Ltd. 237

Li-Zhu Zhou is a full Professor of Department of Com-puter Science and Technology at Tsinghua University,Beijing, China. He received his Master of Science degreein Computer Science fromUniversity of Toronto in 1983.His major research interests include database systems,digital resource management, web data processing, andinformation systems.

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Comp. Anim. Virtual Worlds 2006; 17: 229–237