Upload
yi-wang
View
212
Download
0
Embed Size (px)
Citation preview
COMPUTER ANIMATION AND VIRTUAL WORLDS
Comp. Anim. Virtual Worlds 2006; 17: 229–237
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cav.126* * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *Key-styling: learning motion stylefor real-time synthesis of 3D animationy
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
By Yi Wang*, Zhi-Qiang Liu and Li-Zhu Zhou* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
In this paper, we present a novel real-time motion synthesis approach that can generate 3D
character animationwith required style. The effectiveness of our approach comes from learning
captured 3D human motion as a self-organizing mixture network (SOMN); of parametric
Gaussians.The learnedmodel describes themotion under the control of a vector variable called
style variable, and acts as a probabilistic mapping from the low-dimensional style values to
the high-dimensional 3D poses. We design a pose synthesis algorithm to allow the user to
generate poses by specifying new style values. We also propose a novel motion synthesis
method, the key-styling, which accepts a sparse sequence of key style values and interpolates
a dense sequence of style values to synthesize an animation. Key-styling is able to produce
animations that are more realistic and natural-looking than those synthesized with the
traditional key-keyframing technique. Copyright # 2006 John Wiley & Sons, Ltd.
Received: 10 April 2006; Revised: 2 May 2006; Accepted: 10 May 2006
KEY WORDS: motion synthesis; motion capture data; neural networks; self-organizingmixture network
Introduction
Generally, 3D motion synthesis approaches can be
categorized into three types: (1) key-framing, which
requires intensive manual labor of artists to create a
sequence of keyframes for each animation; (2) inverse
dynamics/kinematics and physical simulation, which
requires in-depth human knowledge about specific
types of motions; (3) example-based synthesis, which is
able to synthesize realistic animations by learning from
motion capture data. Compared with the former two
types, the example-based synthesis requires neither in-
depth human knowledge nor intensivemanual labor. By
designing automatic learning and convenient synthesis
methods, we would be able to maximize the effective-
ness of example-based synthesis.
*Correspondence to: Yi Wang, Department Computer Science(Graduate School at Shenzhen), Tsinghua University atShenzhen, Guang-Dong Province, 618055, China.E-mail: [email protected] Wang is a Ph.D. Student, Zhi-Qiang Liu and Li-Zhu Zhouare professors.
Contract/grant sponsor: Hong Kong RGC; contract/grantnumber: 1062/02E, CityU 1247/03E.Contract/grant sponsor: Natural Science Foundation of China;contract/grant number: 60573061.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Copyright # 2006 John Wiley & Sons, Ltd.
In this paper, we present a novel and efficient
example-based motion synthesis approach called key-
styling. By extracting and representing characteristics of
the training motion as a vector variable called style
variable, we learn a novel neural network, the self-
organizing mixture network (SOMN),1 from motion
capture data as a probabilistic mapping from values
of the style variable to 3D poses. Because a 3D pose is
defined by the rotations of major body joints, it has to be
represented by a high-dimensional vector. Whereas the
characteristics of a certain kind of motion can usually be
encoded by a few parameters, which constitute a much
lower-dimensional style variable. So the learned prob-
abilistic mapping allows the user to manipulate the low-
dimensional style values instead of the complex high-
dimensional 3D poses.
Because the learning algorithm is supervised, it en-
sures each dimension of the style variable has an explicit
physicalmeaning. For example, the style variable of box-
ing motion may have two dimensions, where one en-
codes the change of body height for evading from being
attacked, and the other encodes the stretch of arm for
punching. Therefore, the user can give a few parameters
to precisely define the style of a pose. This makes it easy
andintuitivefor theusertocreateasparsesequenceofkey-
style values. The physical meaning of each dimensions of
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Y. WANG, Z.-Q. LIU AND L.-Z. ZHOU* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
the style variable also makes it reasonable to interpolate
the key-style sequence to generate a dense sequence of
style values,which could bemapped to a realisticmotion
by the learned probabilistic mapping.
We also develop a synthesis program, which allows
the user to drag the mouse over a screen area
representing the style space, which is an Euclidean space
spanned by all possible style values. Because every
mouse position corresponds to a style value, the
trajectory of mouse movement could be mapped to a
character animation in real-time.
The rest of this paper is organized as follows. In
Section 2, we explain our motivations by comparing
various aspects of our work with previous researches. In
Section 3, we explain the SOMN of parametric Gaussian
model and derive its learning algorithm. Section 4 is
about the synthesis of both static poses and dynamic
motions. In Section 5, we show the usability and
effectiveness of our prototype system by learning and
synthesizing boxing motions.
RelatedWorks
Bayesian Learning for Example-BasedMotion Synthesis
Because 3D character motions are so complex, it is
difficult to capture all the details with a deterministic
model. In recent years, some top researches in the
example-based motion synthesis area began to use the
Bayesian learning framework to model the randomness
feature of 3D motions. Some typical works are
References [2–4], as listed in Table 1. In Reference [2],
Li et al. used an unsupervised learning approach to
learning possible recombinations of short motion
segments as a segment hidden Markov model.5,6 In Refer-
ence [3], Grochow et al. used a non-linear principle
component analysis method called Gaussian process
latent variable model,7 to project 3D pose into a low-
dimensional space, where each point in the space could
be projected back to a 3D pose. In contrast with
Reference [2], the learning approach proposed in
Reference [3] is unsupervised, and the subject to be
Learning (dynam
Supervised approach Brand (1Unsupervised approach Li (2002);2 Bra
Table I. The placement
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Copyright # 2006 John Wiley & Sons, Ltd. 230
modeled is static poses other than dynamic motions. In
Reference [4], Brand and Hertzmann proposed to learn
the humanmotion under control of a style variable as an
improved parametric hidden Markov model8 with an
unsupervised learning algorithm.9
In this paper, we present a supervised learning
algorithm to model the 3D poses under the control of
the style variable. Compared with the works in the first
columnofTable1 thatmodel andsynthesizemotions, our
method has the flexibility to synthesize both static poses
anddynamicmotions. Comparedwith the unsupervised
pose synthesis methods as listed in the second row of
Table 1, our supervisedmethod ensures that eachdimen-
sion of the style variable has explicit physical meaning,
which allows the users to give precise style values to
express their requirements of the synthesis results.
Modeling Motion Style SeparatelyfromMotion Data
The idea of extracting motion styles and modeling them
separately from the motion data was originally pro-
posed in the research area of gesture recognition and
waswell developed in Reference [8]. In Reference [4], the
idea of separating style from data is used to develop a
productive motion synthesis approaches that allows the
users manipulate low-dimensional style values instead
of the high-dimensional motion data. Because each
frame of the motion data corresponds to a 3D pose,
which is defined by the 3D rotations of dozens of major
body joints, it has to be represented by a very high
dimensional vector, usually over 60-dimensions.2 The
high dimensionality makes it difficult to model and
manipulate the motion data. On the contrary, the style
variable is usually a low-dimensional vector (1 � 3-
dimensional in our experiments) that encodes a few
important aspects of the motion. To support motion
synthesis by manipulating in the low-dimensional style
space and to keep maximum flexibility of synthesis, we
learn a probabilistic mapping from style values to
human poses, instead of motion sequences as in
Reference [4], as a conditional probabilistic distribution
(p.d.f.) Pðx j uÞ, where u is the style variable and x
represents 3D pose.
ic) motions Learning (static) poses
999)9 This papernd (2000)4 Grochow (2004)3
of our contribution
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Comp. Anim. Virtual Worlds 2006; 17: 229–237
1The Kullback-Leibler is a generalized form of the likelihood.The EM algorithm learns amodel bymaximizing the likelihood.
MOTION STYLE FOR REAL-TIME SYNTHESIS OF 3D ANIMATION* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
A well-known model that represents a conditional
distribution is the parametric Gaussian, whose mean
vector is a function f (u). However, in order to capture the
complex distribution of 3D poses caused by the complex
dynamics of human motion, we model Pðx j uÞ as a
mixture of parametric Gaussians. Although most mix-
ture models are learned by the Expectation-Maximiza-
tion (EM) algorithm, we derived an learning algorithm
based on the (SOMN), which is in fact a stochastic
approximation algorithm. Compared with the EM
algorithm, which is a deterministic ascent algorithm,10
SOMN achieves faster convergence rate and is less
probable to be trapped in local optima.1
Learning SOMNofParametric Gaussians
The SOMNof ParametricGaussians Model
Mixture models is able to capture complex distributions
over a set of observables X ¼ fx1; . . . ; xNg. Denote L as
the set of parameters of the model, the likelihood of an
mixture model is,
pðx jLÞ ¼XKj¼1
ajpjðx jljÞ (1)
where each pj(x) is a component of the mixture model, ajis the corresponding weight of the component, and ljdenotes the parameters of the j-th component.
Given the observables, X ¼ fx1; . . . ; xNg, learning a
mixture model is in fact an adaptive clustering process,
where some of the observables are used to estimate a
component, whereas some other observables are used to
estimate other components. A traditional approach to
learning a mixture model is the EM algorithm, which, as
a generalization of the K-means clustering algorithm,
alternatively executes an E-step and anM-step, where in
the E-step each observable xi is assigned to a component
pj to the extent lij; and in the M-step each pj is estimated
from those observables xi with lij > 0. It has been proven
in Reference [11] that this iteration process is actually a
deterministic ascent optimization algorithm.
The SOMNproposed by Yin and Allinson in 20011 is a
neural network that has similar properties as the well-
known clustering algorithm, the self-organizing map
(SOM), but in the SOMN, each node represents a
component of a mixture model. The major difference
between the learning algorithm of SOMN and the EM
algorithm is that the former uses the Robbins-Monro
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Copyright # 2006 John Wiley & Sons, Ltd. 231
stochastic approximation method to estimate the mix-
ture model to achieve generally faster convergence rate
and less probability of being trapped by local optima.
In this paper, we derive a specific SOMN, which,
unlike the traditional SOMN that models a joint dis-
tribution p(x), models a conditional probability distri-
bution pðx j uÞ as a mixture model of,
p x j u;Lð Þ ¼XKi¼1
aipj x j u;lið Þ (2)
where, each component pj(�) is a linear parametric
Gaussian distribution,
pj x j u;lj
� �¼ N x;W ju þ mj;Sj
� �(3)
where W j is called the style transformation matrix that
together with mj and Sj form the parameter set lj of the
j-th component.
The Learning Algorithm
In SOMN, learning of parametric Gaussians minimizes
the following Kullback–Leibler divergence1 between the
true distribution pðx j u;LÞ and the estimated one
pðx j u;LÞ,
D p; pð Þ ¼ �Z
logpðx j u;LÞpðx j u;LÞ pðx j u;LÞdx (4)
which is always a positive number and will be zero if
and only if the estimated distribution is the same as the
true one. When the estimated distribution is modeled as
amixturemodel, taking partial derivatives of Equation 4
with respect to li and ai leads to
@
@liD p; pð Þ¼ �
Z1
p x u; L���� � @p x u; L
���� �@li
264
375p xð Þdx
@
@aiD p; pð Þ¼ �
Z1
p x u; L���� � @p x u; L
���� �@ai
264
375p xð Þdx
þ j@
@ai
XKj¼ 1
ai � 1
24
35
¼ � 1
ai
Zaipi x u; li
��� �pi x u; L
���� � � jai
264
375p xð Þdx
(5)
where j is a Lagrange multiplier to ensureP
i ai ¼ 1.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Comp. Anim. Virtual Worlds 2006; 17: 229–237
Y. WANG, Z.-Q. LIU AND L.-Z. ZHOU* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
As in Reference [1], we choose the Robbins–Monro
stochastic approximation to solve Equation 5 because
the true distribution is unknown and the equation
depends only on the estimated version. We obtain the
following set of iterative updating rules:
liðtþ 1Þ ¼ liðtÞ þ dðtÞ 1
pðx j u; LÞ@pðx j u; LÞ
@liðtÞ
" #
¼ liðtÞ þ dðtÞ ai
pðx j u; LÞ@pðx j u; liÞ
@liðtÞ
" #(6)
aiðtþ 1Þ ¼ aiðtÞ þ dðtÞ aiðtÞpðx j u; liÞpðx j u; LÞ
� aiðtÞ" #
¼ aiðtÞ � dðtÞ pði j x; uÞ � aiðtÞ½ � (7)
where d(t) is the learning rate at time step t, and pðx j u;LÞis the estimated likelihood, pðx j u;LÞ ’
Pi aipðx j u;liÞ.
The detailed derivation of Equation 5, 6, and 7 are
similar to the derivations in Reference [1].
To derive the partial derivative of the component
distribution in Equation 6, we denote Zi ¼ ½W i;mi� andV ¼ u; 1½ �T , so that pðx j; u; liÞ ¼ N ðx;W iu þ mi;SiÞ ¼Nðx;ZiV;SiÞ. Then, by taking derivative of Equation
6 and solving the following equation,
Ziðtþ 1Þ ¼ ZiðtÞ þ dðtÞ ai
pðx j LÞ@Nðx;ZiV;SiÞ
@ZiðtÞ
" #
¼ ZiðtÞ þ dðtÞ ai
pðx j LÞN ðx;ZiV;SiÞ
@ logNðx;ZiV;SiÞ@ZiðtÞ
" #
¼ ZiðtÞ þ dðtÞ pði j xÞ @ logNðx;ZiV;SiÞ@ZiðtÞ
�
¼ ZiðtÞ �1
2dðtÞpði j xÞ @
@Ziðx� ZiVÞTS�1ðx� ZiVÞ
�
¼ ZiðtÞ �1
2dðtÞpði j xÞ @
@ZiðxTS�1x� ZiV
TS�1x� xTS�1ZiVþ ZiV
TS�1ZiVÞ
�
¼ ZiðtÞ �1
2dðtÞpði j xÞ @
@ZiðxTS�1x� 2ZiV
TS�1xþ ZiV
TS�1ZiVÞ
�
¼ ZiðtÞ �1
2dðtÞpði j xÞ �2 @
@ZiðZiVÞTS�1xþ
@
@ZiðZiVÞTS�1ZiV
�
¼ ZiðtÞ �1
2dðtÞpði j xÞ �2 @
@ZiVTZT
i S�1xþ @
@ZiðVTZT
i S�1ÞðZiVÞ
�
¼ ZiðtÞ �1
2dðtÞpði j xÞS�1 xVT � ZVVT
� �
we arrived the updating rule for Zi:
DZi ¼ �1
2dðtÞpði j xÞS�1 xVT � ZVVT� �
(8)
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Copyright # 2006 John Wiley & Sons, Ltd. 232
By considering pði j x; uÞ as the Gaussian neighborhood
function,12 we can consider Equation 8 exactly as the
SOM updating algorithm. Although an updating rule
for DSi may be derived similarly, it is unnecessary in
the learning algorithm, because the covariance of
each component distribution implicitly corresponds
to the neighborhood function pði j xÞ, or the spread
range of updating a winner at each iteration. As
the neighborhood function has the same form for
every node, the learned mixture distribution is
homoscedastic.
SOMNof ParametricGaussians forMotion Synthesis
Determine the Physical Meaning of theStyleVariable
A learned SOMN of parametric Gaussian model
pðx j u;LÞmay be considered as a probabilistic mapping
from a given style value u to a 3D pose x. If the user
knows the physical meaning of each dimension of the
style variable u, he can give a precise style value u to
specify the characteristics of the synthesized pose. The
supervised learning framework presented in Section 3
allows the user to determine the physical meaning of the
style variable prior to learning.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Comp. Anim. Virtual Worlds 2006; 17: 229–237
MOTION STYLE FOR REAL-TIME SYNTHESIS OF 3D ANIMATION* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
As an example, suppose that we captured boxing
motion as training data, where the boxer sometimes
crouches to evade from being attacked and some other
times punches his fist to attack. We can use a 2-
dimensional style variable to describe the details of the
boxing motion, where one dimension encodes the body
height, which varies from crouching to standing up, and
the other encodes the arm length when punching. Once
the physical meaning of each dimension of the style
variable is determined, the style values l ¼ fl1; . . . ;lNgof each of the training frames x ¼ fx1; . . . ; xNg can be
calculated from the training motion itself.
It is notable that if we carefully choose a number of
dimensions of the style variable that encode visually
independent characteristics of the training motion, the
style space that is spanned by all possible style values, will
Figure 1. A prototype system for training the SOMN model fro
convenient for the user to browse the training 3D motion data, to
SOMN model, and to de
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Copyright # 2006 John Wiley & Sons, Ltd. 233
be an Euclidean space,withinwhich, a curve corresponds
to smooth change of the style value. This is interesting for
synthesizing character animations, instead of static poses,
because the smooth change of motion style like body
height and punch distance usually leads to smooth body
movement. Experiments are shown in Section 5.
Generate 3DPose fromGivenStyleValue
Given a learned SOMN of parametric Gaussians
pðx j u;LÞ with K components, mapping a given style
value u to a 3D pose x can be achieved by substitute u into
the model and draw a sample x from the distribution
pðx j u;LÞ. Although the Monte Carlo sampling method
m motion capture data. The graphical user interface makes it
discover stylistic characteristics, to specify parameters of the
signate output files.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Comp. Anim. Virtual Worlds 2006; 17: 229–237
Figure 2. Two screen shots of the prototype system for real-time motion synthesis. The graphical user interface is adaptable to the
change of dimensionality of the style variable. For training motions with style-dimensionality � 3, the synthesis system creates a
style panel including a set of slider-widgets to represent every dimensions of the style variable (as shown in the left). The user can
drag the sliders to adjust the style values; For style-dimensionality< 3, users may choose to represent the style space with a screen
area (as shown in the right), with the grey cross-sign denote currently select style values. If the user drags the cross-sign with
mouse cursor, the system generates a motion sequences with the style of poses changes as the dragging curve.
input: The given new style u
output: The synthesized pose x
calculate the most porbable pose from each component;
foeach j 2 ½1;K� doj xj W ju þ mj;
end
select the most probable one among the calculation
result;
j argmaxj aj pJðxjju;LÞ;x xj;
Algorithm 1. synthesize pose from given style value
Y. WANG, Z.-Q. LIU AND L.-Z. ZHOU* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
is generally applicable for most complex distributions,
to avoid the intensive computation and achieve real-
time performance, we designed the following two-step
algorithm as shown in Algorithm 1 to calculate the pose
x with the highest probability. The first step of the
algorithm calculates the poses fxjgKj¼1 that are most
probable for each component pj of the learned model;
and then the algorithm selects and returns the most
probable one x among all the fxjgKj¼1.
Synthesiswith theKey-Styling Approach
We developed an interactive graphical user interface
(GUI) system as shown in Figure 2 to ease the pose and
motion synthesis. With the parameter adjustment panel
(to the left of the main window), the user is able to
specify a style value by adjusting every dimension of the
style variable. The changed style value is instantly input
to Algorithm 1, and the synthesized pose x is displayed
in real-time.
With this GUI program, users can also create
animations by (1) select a sparse sequence of key-styles
to define the basic movement of a motion segment, (2)
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Copyright # 2006 John Wiley & Sons, Ltd. 234
produce a dense sequence of style values interpolating
the key-styles, and (3) map each style value into a frame
to synthesize the motion sequence. As the traditional
method of producing character animations is called
keyframing, which interpolate a sparse sequence of
keyframes, we fittingly name our method key-styling.
A known problem of keyframing is that the synthes-
ized animation appears rigid and robotic. This is
because the keyframes is represented by a high-
dimensional vector consisting of 3D joint rotations.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Comp. Anim. Virtual Worlds 2006; 17: 229–237
MOTION STYLE FOR REAL-TIME SYNTHESIS OF 3D ANIMATION* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Evenly interpolating the rotations cannot ensure evenly
interpolated dynamics. On the contrary, interpolating
the key-styles results in smooth changes of the major
dynamics and style-to-pose mapping adds kinematics
details to the motion. So the change of kinematics details
does not need to be evenly.
Experiments
To demonstrate the usability our synthesis approach, we
captured a segment of boxing motion of about 3 minutes
under the frame-rate of 66 frame-per-second as the
training data. Some typical poses in the motion is shown
in Figure 3 (a), (b), and (c). Because the boxer sometimes
crouches to evade and some other times punches out, we
Figure 3. (a)�(c): Some typical poses in our training boxingmotion
(b) a large body height value and small punch distance, (c) large b
segment of synthesis motion — punching while crouching, which is
distance. The starting pose is similar to the one shown in (a), whi
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Copyright # 2006 John Wiley & Sons, Ltd. 235
use a 2-dimensional style variable to encode the body
height and the distance of punching.
Once the dimensionality of the style variable is
determined, labeling the training frames with style
values is not a difficult problem. For the application of
automatic motion synthesis, we must have the skeleton
(the connections of joints) for rendering the synthesized
motion and must have the rotations of joints as training
data. With these two kinds of information, it is easy to
compute the style value ui for each training frame xi. ‘We
wrote a motion browser program, as shown in Figure 1,
to help the user browsing the captured motions,
discovering motion styles, and writing embedded Perl
scripts to derive style values from motion data.’
After estimating an SOMN of parametric Gaussians
from the labeled training frames, we can give new style
value by dragging the slide bars of our prototypemotion
, where, (a) a small body height value and small punch distance,
ody height value and small large punch distance. (d) A short
generated by simply dragging the slide bar to change the punch
le the ending pose has never appeared in the training motion.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Comp. Anim. Virtual Worlds 2006; 17: 229–237
Y. WANG, Z.-Q. LIU AND L.-Z. ZHOU* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
synthesis system (as shown in Figure 2). The short
animation shown in Figure 3(d) is generated by
dragging the slide bar that represents the punch distance
synthesizes.
The accompanied demo video shows real-time syn-
thesis of boxing animation by dragging mouse over a
screen area representing the style space of boxing motion.
Conclusion and Discussion
In this paper, we presented a novel approach to
synthesizing 3D character animations automatically
and conveniently. The first step of our approach is to
learn a probabilistic mapping from a low-dimensional
style variable to high-dimensional 3D poses. By model-
ing the probabilistic mapping with an SOMN of
parametric Gaussians, we designed a learning algorithm
which is less prone to being trapped in local optima and
converges faster than previous EM-based algorithms for
learning mixture models. The supervised learning
framework gives the user the flexibility to specify the
physical meaning of each dimension of the style
variable. As a result, given a learned model and using
our prototype motion synthesis system, the user is able
to create 3D poses by simply dragging slide-bar widgets
and/or to produce character animations by our key-
styling technique.
ACKNOWLEDGEMENTS
We sincerely appreciate Dr. Hu-Jun Yin of University of Man-
chester for his fruitful suggestions and detailed explanation on
the SOMNmodel. We sincerely appreciate Professor Wen-Ping
Wang of Hong Kong University for the constructive discussion
about the motion synthesis approach. We gratefully acknowl-
edge Microsoft Research Asia for providing the motion capture
data. This work is supported in part byHongKong RGCProject
No. 1062/02E, CityU 1247/03E, and Natural Science Founda-
tion of China No. 60573061.
References1. Yin H-J, Allinson NM. Self-organizing mixture networks
for probability density estimation. IEEE Transactions onNeural Networks 2001; 12(2): 405–411.
2. Li Y, Wang T-S, Shum H-Y. Motion texture: a two-levelstatistical model for character motion synthesis. Proceedingsof ACM SIGGRAPH, pages 465–472, 2002.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Copyright # 2006 John Wiley & Sons, Ltd. 236
3. Grochow K, Martin SL, Hertzmann A, Popovic Z. Style-based inverse kinematics. In Proceedings of ACMSIGGRAPH, 2004; pp. 522–531.
4. Brand M, Hertzmann A. Style machines. In Proceedings ofACM SIGGRAPH, 2000; pp. 183–192.
5. Ostendorf M, Digalakis VV, Kimball OA. From HMM’s tosegment models: a unified view of stochastic modeling forspeech recognition. IEEE Transactions on Speech and AudioProcessing 1996; 4(5): 360–378.
6. Gales MJF, Young SJ. The theory of segmental hiddenMarkov models. Technical report, Cambridge Univ. Eng.Dept., 1993.
7. Lawrence ND. Gaussian process latent variable models forvisualisation of high dimensional data. In Proc. 16th NIPS,2004.
8. Wilson AD, Bobick A. Parametric hidden markov modelsfor gesture recognition. IEEE Transactions on PatternAnalysis Machine Intelligence 1999; 21(9): 884–900.
9. Brand M. Pattern discovery via entropy minimization. InArtificial Intelligence and Statistics, vol. 7. Heckerman D,Whittaker C (eds). Morgan Kaufmann, Los Altos, 1999.
10. Yin H-J, Allinson NM. Comparison of a bayesian somwith the em algorithm for gaussian mixtures. In Proceed-ings of Workshop on Self-Organizing Maps, 1997; pp. 304–305.
11. Ormoneit D, Tresp V. Averaging, maximum penalisedlikelihood and bayesian estimation for improving gaussianmixture probability density estimates. IEEE Transactions onNeural Networks 1998; 9: 639–650.
12. Teuvo Kohonen. Self-Organizing Maps. Springer, Berlin,2001.
Authors’ biographies:
YiWang received the B.Sc. degreewith first-class honorsin computer science and engineering from the ChangshaInstitute of Technology, China, and is currently a Ph.D.student at the Tsinghua University, Beijing, China. Hehas been working at Microsoft Research Asia on photo-realistic rendering as a visiting student, and working atthe City University of Hong Kong on the theory ofBayesian learning. His major interests are designingprogramming languages and writing parsers/compi-lers/interepters for fun.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Comp. Anim. Virtual Worlds 2006; 17: 229–237
MOTION STYLE FOR REAL-TIME SYNTHESIS OF 3D ANIMATION* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Zhi-Qiang Liu (S
!-82-M
!-86-SM
!-91) received theM.A.Sc. degree in Aerospace Engineering from theInstitute for Aerospace Studies, The University ofToronto, and the Ph.D. degree in Electrical Engineeringfrom The University of Alberta, Canada. He is currentlywith School of Creative Media, City University ofHong Kong. He has taught computer architecture, com-puter networks, artificial intelligence, programminglanguages, machine learning, pattern recognition, com-puter graphics, and art & technology. His interests arescuba diving, neural-fuzzy systems, painting, garden-ing, machine learning, mountain/beach trekking, smartmedia systems, computer vision, serving the communityand fishing.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Copyright # 2006 John Wiley & Sons, Ltd. 237
Li-Zhu Zhou is a full Professor of Department of Com-puter Science and Technology at Tsinghua University,Beijing, China. He received his Master of Science degreein Computer Science fromUniversity of Toronto in 1983.His major research interests include database systems,digital resource management, web data processing, andinformation systems.
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Comp. Anim. Virtual Worlds 2006; 17: 229–237