Facial motion cloning with radial basis functions in MPEG-4 FBAmarcof/pdf/gmod07.fratarcangeli.pdfMPEG-4 Facial Deﬁnition Parameters (or Feature Points) are used to deﬁne the shape

Graphical Models 69 (2007) 106–118

www.elsevier.com/locate/gmod

Facial motion cloning with radial basis functionsin MPEG-4 FBA

Marco Fratarcangeli a,b,*, Marco Schaerf a, Robert Forchheimer b

a University of Rome ‘‘La Sapienza’’, Department of Computer and Systems Science, Via Salaria 113, 00196, Rome, Italyb Linkoping Institute of Technology, Department of Electrical Engineering, Image Coding Group, 581 83 Linkoping, Sweden

Received 25 November 2004; received in revised form 3 April 2006; accepted 25 September 2006Available online 12 December 2006

Abstract

Facial Motion Cloning (FMC) is the technique employed to transfer the motion of a virtual face (namely the source) toa mesh representing another face (the target), generally having a different geometry and connectivity. In this paper, wedescribe a novel method based on the combination of the Radial Basis Functions (RBF) volume morphing with the encod-ing capabilities of the widely used MPEG-4 Facial and Body Animation (FBA) international standard. First, we find themorphing function G(P) that precisely fits the shape of the source into the shape of the target face. Then, all the MPEG-4encoded movements of the source face are transformed using the same function G(P) and mapped to the correspondingvertices of the target mesh. By doing this, we obtain, in a straightforward and simple way, the whole set of the MPEG-4 encoded facial movements for the target face in a short time. This animatable version of the target face is able to performgeneric face animation stored in a MPEG-4 FBA data stream.� 2006 Elsevier Inc. All rights reserved.

Keywords: Facial animation; Morph targets; MPEG-4 FBA; Virtual characters; Virtual humans

1. Introduction

Accurate animation of virtual actors is requiredin a wide range of fields like in entertainment, indistance learning and teleconferencing as well as inadvanced human–machine interfaces. Currently,the best-looking animations of synthetic faces are

1524-0703/$ - see front matter � 2006 Elsevier Inc. All rights reserveddoi:10.1016/j.gmod.2006.09.006

* Corresponding author. Fax: +39 06 85300849.E-mail addresses: [email protected] (M. Fratarcangeli),

[email protected] (M. Schaerf), [email protected](R. Forchheimer).

URLs: http://www.dis.uniroma1.it/~frat (M. Fratarcangeli),http://www.dis.uniroma1.it/~schaerf (M. Schaerf), http://www.icg.isy.liu.se/staff/robert (R. Forchheimer).

done manually by visual artists or are tracked usingmotion capture equipment. Facial animation ofcharacters is among the most tedious parts of crea-tion of animations, it is time-consuming and itrequires skilled animators while motion trackinghardware can be quite expensive. In order to savetime and money, a trend is to develop methods toautomatically simulate in the most realistic waythe facial movements and then perform them inreal-time. This is a challenging task because, onone side, the virtual faces present different charac-teristics (connectivity, shape), making difficult todefine precisely an input space. On the other side,the human visual system is very sensitive to the

.

mailto:[email protected]



http://www.dis.uniroma1.it/~frat

http://www.dis.uniroma1.it/~schaerf

http://www.icg.isy.liu.se/staff/robert

http://www.icg.isy.liu.se/staff/robert

1 A viseme is the visual counterpart of a phoneme.

M. Fratarcangeli et al. / Graphical Models 69 (2007) 106–118 107

nuances of the perceived facial movements. Thus,the produced animation must be as accurate as pos-sible, otherwise the risk is to loose the meaningfulinformation hard-coded into the facial movements.As a last point, the computation to achieve the ani-mation must not consume precious hardwareresources like CPU-time or memory. This issue ariseespecially when the animation must be performedon modest equipped platforms, like mobile phonesor handheld devices, or in the most demandingapplications like games.

In this paper, we describe a Facial Motion Clon-ing (FMC) method that assists in solving theseproblems. FMC is used to copy the facial motionfrom one face to another. The facial movementsare represented by a set of morph targets encodedby the MPEG-4 Facial and Body Animation(FBA) standard. A morph target is a variation ofthe face that has the same mesh topology, but differ-ent vertex positions. Essentially, it is the source faceperforming a particular key position. Each morphtarget corresponds to a basic facial action encodedby a MPEG-4 FBA parameter. The linear interpola-tion of the morph targets is able to represent a widerange of facial movements. Thus, an artist couldproduce one detailed face including all morph tar-gets, then use this FMC implementation to quicklyproduce the corresponding full set of morph targetsfor a new and completely different face. The morphtargets of the source face can be produced manuallyby artists, by motion capture techniques or in anautomatic way, for example by applying physicalsimulation algorithms of the face anatomy [11,12].

There are two main advantages in using morphtargets encoded by MPEG-4 FBA. The first one isin the low requirements of CPU-time usage, sincethe animation is achieved by linear interpolationof the morph targets. The second one lies in thecapability of the MPEG-4 FBA to express and pre-cisely control the whole range of facial expressions[24]. Thus, by computing the morph targets forthe target face, we can use them to perform genericface animation encoded in a MPEG-4 FBA streamin a computationally inexpensive manner.

2. MPEG-4 facial and body animation standard

We provide a short introduction to MPEG-4FBA specification, describing the main parameterssets cited in this paper. Readers familiar withMPEG-4 FBA may wish to skip the section, oruse it only as a quick reference.

MPEG-4 is an international standard dealingwith the efficient transmission of animation. A sig-nificant part of the MPEG-4 is the Face and BodyAnimation (FBA), a specification for efficient cod-ing of shape and animation of human faces andbodies [24,1]. Instead of regarding animation as asequence of images of faces with fixed shape andsize, MPEG-4 FBA gives the possibility to express,for each animation frame, the facial movementsthrough a set of weighted parameters. Thus, the ani-mation is synthesized by a proper deformation ofthe face mesh according to such parameters.

MPEG-4 FBA specifies a face model in its neu-tral state together with two main data sets: 84 FacialDefinition Parameters (FDPs), also called featurepoints, and 68 Facial Animation Parameters(FAPs). According to the standard, a generic facemodel is in neutral state when the gaze is in direc-tion of the z-axis, all face muscles are relaxed, eye-lids are tangent to the iris, lips are in contact andthe mouth is closed in such a way that the upperteeth touch the lower ones [24].

FDPs (Fig. 1) are control points used to definethe shape of a proprietary face model and to pro-vide a spatial reference for defining FAPs. The loca-tion of FDPs has to be known for any MPEG-4compliant face model. The FAP set is subdividedin high- and low-level parameters. Low-level FAPsare used to express basic action that the face canperform, like closing an eyelid or stretching a cornerlip. All low-level FAPs are expressed in terms of theFacial Animation Parameter Units (FAPU). Theycorrespond to distances between key facial featuresand are defined in terms of distances between theFDPs. High-level FAPs are useful to express in acompact way more complex movements like expres-sions and visemes.1 FAPs allow for representationof the whole set of natural facial movements.

The FBA parameters, both FDPs either FAPs,can be extracted automatically from visual, audioand motion capture systems [4,3,13,16], thereforethey can be properly compressed and used toachieve, for example, teleconferencing with a verylow bandwidth (<3 kbps) [21,7].

3. Related work

Several different methods have been proposedthrough the years to achieve computer facial

Fig. 1. MPEG-4 Facial Definition Parameters (or Feature Points) are used to define the shape of a face model [24,1].

108 M. Fratarcangeli et al. / Graphical Models 69 (2007) 106–118

animation. Key-frame interpolation technique pro-posed by Parke [25] is one of the most intuitiveand still widely used. Basic expressions in 3D aredefined at different moments in the animatedsequence and intermediate frames are simply inter-polated between two successive basic expressions.Physics-based approaches [17,18,26,14,15] animatefaces by simulating the influence of muscle contrac-tion onto the skin surface. The general approach isto build a model respecting the anatomical struc-ture of the human head. By doing this, the pro-duced motion is very realistic. The drawback isthat these methods involve a massive use ofCPU-resources to advance the simulation. Thus,they are not suitable for interactive applicationson modestly equipped platforms.

Synthesis of facial motion by reusing existingdata (Facial Motion Cloning), has recently becomepopular in computer graphics. Briefly, the mainconcept is to transfer the motion from a source faceto a static target face, making this latter face anim-atable. Learning-based methods rely on a dataset of3D scans of different facial expressions and mouthshapes to build a morphable model [6,5], that is, avector space of 3D expressions. Difference vectors,such as a smile-vector, can be added to new individ-ual faces. Unlike physical models, these methodsaddress the appearance of expressions, rather thansimulating the muscle forces and tissue propertiesthat cause the surface deformations.

Escher et al. [8,9], developed a cloning methodbased on Dirichlet Free Form Deformation (FFD)


and applied it to the synthesis of a virtual face inorder to obtain a virtual avatar of a real humanhead. In FFD algorithms, the deformation is con-trolled by a few external points. To achieve volumemorphing between the source and the target facemeshes, the control points are usually difficult todefine and not very intuitive for manipulation.Deformation based on scattered data interpolation,like Shepard-based methods and radial basis func-tion methods [10], are easier to manipulate for ourpurpose. In particular, radial basis functions aremore effective when landmarks are scarce and theyare also computationally efficient.

To develop our method, we were inspired mainlyby the two following methods. In Expression Clon-ing developed by Noh [20], the movements of asource face are expressed as motion vectors appliedto the mesh vertices. The source mesh is morphed,through the use of RBF volume morphing and neu-ral networks, to match the shape of the target mesh.The motion vectors are transferred from sourcemodel vertices to the corresponding target modelvertices. The magnitude and direction of the trans-ferred motion vectors are properly adjusted toaccount for the local shape of the model. TheFMC approach developed by Pandzic [23] relieson the fact that the movements of the source faceare coded as morph targets. In Pandzic’s approach,each morph target is described as the relative move-ment of each vertex with respect to its position inthe neutral face. Facial cloning is obtained by com-puting the difference of 3D vertex positions betweenthe source morph targets and the neutral sourceface. The facial motion is then added to the vertexpositions of the target face, resulting into the

Fig. 2. Facial Motion Cloning mechanism. A RBF volume morphi

animated target face. The key positions representedby morph targets are expressed by the MPEG-4Facial Animation Parameters (FAPs). Each morphtarget corresponds to a particular value of oneFAP. By interpolating the different morph targetsframe-by-frame, animation of the source face isachieved.

4. Our approach

Our FMC method is schematically representedby Figs. 2 and 3 and can be summarized as follows.

Given a manually picked set of 48 feature pointson the input face meshes, we compute a scattereddata interpolation function G(P) employed to pre-cisely fit the shape of the source into the shape ofthe target mesh through a volume morphing. Then,we map each vertex of the target face into the corre-sponding triangular facet of the deformed sourcemesh in its neutral state through a proper projec-tion. The target vertex position is thus expressedin a function of the vertex positions of the sourcetriangular facet through barycentric coordinates.At this point, we deform all the source MTs byapplying to each one of them the same morphingfunction G(P) used to fit the neutral source intothe neutral target mesh. Hence, we can computethe new position of each target vertex consideringthe location of the corresponding deformed sourcefacet, obtaining the target MT. The whole set ofmorph targets is called Animatable Face Model(AFM) and it can be directly animated by commer-cial MPEG-4 FBA players [2].

Inputs to the method are the source and targetface meshes. The source face is available in neutral

ng G(P) is performed between the source and the target face.

Fig. 3. Facial Motion Cloning mechanism. Using the same deformation function G(P), all the source morph targets are cloned to thetarget face.


state as defined in the MPEG-4 FBA specification.The morph targets (MTs), corresponding to theMPEG-4 FAPs, of the source face are available aswell. The target face exists only in the neutral state.For each neutral face mesh, the correspondingMPEG-4 Feature Definition Point (FDP) set mustbe defined, that is, each FDP is manually mappedonto a vertex of the input face meshes. We need48 out of 84 FDPs since we do not consider theFDPs corresponding to group 10 (ears), 6 (tongue),FDPs from 9.8 to 9.11 (teeth) and 11.5 (top of thehead). The process to manually map the 48 FDPson the input faces requires 10–30 min according tothe user skills. The goal is to obtain the target facewith the motion copied from the source face. Wedo not make any assumption about the number ofvertices or their connectivity in the input models.For simplicity, we will consider source and targetas triangular meshes.

The remainder of this paper is organized as fol-lows. In Section 5, we detail how to find the fittingfunction G(P) and in Section 6 we describe the clon-ing process. Section 7 presents our experimentalresults and in Section 8 we conclude by providingdirections for future work.

5. Shape fitting

The task of the shape fitting process is to adaptthe generic source face mesh to fit the target face

mesh. As input, we take the source and target facemeshes. The source and the target faces are bothavailable in neutral position as defined in theMPEG-4 FBA specification (Section 2 and [24]).For each neutral face mesh, the corresponding sub-set of 48 MPEG-4 Facial Definition Points (FDPs)must be defined as specified in Section 4. Takingthese feature points as a reference, we are able toextract some face features, that is, the eye contourand the inner lip contour (Section 5.1). We use theFDPs together with the vertices belonging to theeye and lip contour features to find a first guessof the scattered data interpolation function G(P)that roughly fits the source into the target mesh(Section 5.2). The precise fitting of the movingzone of the deformed source model is obtainedby iteratively refining G(P) (Section 5.3).

5.1. Eye and lip feature extraction

Starting from the manually picked MPEG-4FDPs, we extract the eye and lip features from theinput meshes through automatic algorithms.

Driven by some of the MPEG-4 FDPs belongingto the eye group, we find a proper path in the meshgraph going from a start point to an end point,where the start and the end point are definedaccording to the round robin UP) LEFT)DOWN) RIGHT) UP. To find the path between thestart and the end point, we use a greedy strategy:

phical Models 69 (2007) 106–118 111

Right Eye FDPs: 3.2) 3.8) 3.4) 3.12) 3.2

Left Eye FDPs: 3.1) 3.7) 3.3) 3.11) 3.1

M. Fratarcangeli et al. / Gra

v = start_point

while (v 5 end_point)

dmin = very big value

for each neighbor n of v do

nn = normalize (n � v)

d = distance ((v + nn), end_point)

if (dmin > d)

dmin = d

vmin = n

end if

end for

insert vmin in the eye contour set

v = vmin

end while

We find the inner lip contours by applying thefollowing algorithm, once for the upper lip and oncefor the lower one:

start_point = FP 2.5 (inner right corner lip)

end_point = FP 2.4 (inner left corner lip)

direction = end_point � start_point

directionxy = (directionx, directiony, 0)

v = start_point

insert v in the lip contour set

while (v 5end_point)

for each neighbor n of v

a = angle (directionx y, (n � v)xy)

end for

v = n having smaller [greatest] a, a2 ð�P2; P

2Þ

insertvintheupper[lower]lipcontourset

end while

Note that a MPEG-4 FBA compliant synthetic facehas the gaze and the nose tip towards the positivez-axis and that the lips are in contact but they arenot connected [24]. We do not provide a methodto assess the correctness of such extraction algo-rithms, however they work in a satisfactory waywith all our test face models.

5.2. Scattered data interpolation

Having computed the face feature points on boththe source and the target face mesh in their neutralstate, we construct a smooth interpolation functionG(P) that precisely fits the source model into the tar-get face model. In mathematical terms, for givendata pairs (Pi,Qi), ði ¼ 1; . . . ; n; P i 2 R3;Qi 2 R3Þ,find a function G : R3 ! R3, G 2 C1 so that

Qi ¼ GðP iÞ i ¼ 1; . . . ; n ð1Þ

In our case, {Pi} is a subset of the vertices of thesource mesh and {Qi} is the subset of correspondentvertices on the target mesh. Applying G(P) to thewhole set of vertices of the source mesh, this latterwill deform in such a way that it will fit the targetshape.

A radial basis function is defined as a basis func-tion whose value depends only on the distance fromthe data point. A radial basis function method issimply the linear combination of such basisfunctions:

GðP Þ ¼Xn

i¼1

hiRiðP Þ ð2Þ

where hi are 3D coefficients, Ri are the radial basisfunctions and P 2 R3 is a vertex of the source mesh.We use as basis function the Hardy multiquadrics,defined as:

RiðP Þ ¼ ðd2i ðP Þ þ r2

i Þ12 ð3Þ

where di is the euclidean distance from Pi, ri is thestiffness radius controlling the stiffness of the defor-mation around Pi. The Hardy method is the combi-nation of the sum of the Hardy multiquadrics and alinear term L used to absorb the linear part of themorphing transformation:

GðP Þ ¼Xn

i¼1

hi � ðd2i ðPÞ þ r2

i Þ12 þLðP Þ ð4Þ

where LðP Þ ¼ hnþ1 � xþ hnþ2 � y þ hnþ3 � zþ hnþ4 isthe affine transformation, fhigðhi 2 R3; i ¼1; . . . ; nþ 4Þ are called Hardy coefficients. Intro-ducing the LðP Þ term, the source mesh will betranslated, rotated and scaled according to the posi-tion of the target interpolation point set {Qi}. Forsmooth results, the parameters {ri} are set to:

ri ¼min diðP jÞ if i 6¼ j;

0 if i ¼ j:

�ð5Þ

In order to find the volume morphing functionG(P), we define the interpolation point sets {Pi}and {Qi} and then solve a linear equation systemto compute the Hardy coefficients. The linear equa-tion system to find the {hi} values is obtained byimposing the n conditions G(Pj) = Qj:

Qj ¼Xn

i¼1

hi � ðd2i ðP jÞ þ r2

i Þ12 þLðP Þ

ðj ¼ 1; . . . ; nÞ ð6Þ


Furthermore, for linear precision, we impose:Xn

i¼1

hi � xi ¼ 0 ð7Þ

Xn

i¼1

hi � yi ¼ 0 ð8Þ

Xn

i¼1

hi � zi ¼ 0 ð9Þ

Xn

i¼1

hi ¼ 0 ð10Þ

This results in a linear system of n + 4 equationswith n + 4 unknowns {hi}(i = 1,. . . , n + 4). Thesolution exists and it is unique [19].

Thus, G(P) can be computed once the corre-spondence set {(Pi,Qi)} has been defined. The dens-er the correspondence set is the closer the resultingfit. We consider the 48 FDPs specified in Section 4as a first guess of the interpolation point set{(Pi,Qi)}. We compute G(P) and apply it to thesource mesh obtaining a rough fitting. Then, weenrich the correspondence set by inserting the ver-tices belonging to the eye and lip contours of thesource and the point lying on the nearest edge ofthe correspondent target contours. We recomputeG(P) with the enriched set of correspondencesand apply it again on the source mesh in its neutralstate obtaining again a rough fitting but this timewith a correct alignment of the eye and lipfeatures.

5.3. Correspondence set refinement

After morphing the source face mesh to roughlyfit the target, we further improve the fitting by spec-ifying additional correspondences. A vertex P of thesource face is called a movable vertex, if its positionin one of the morph targets is not equal to its posi-tion in the neutral face. Thus, such a vertex willpotentially move during the animation. We projectthe movable vertices from the deformed sourcemesh on the target face surface by casting rays witha fixed length along the vertex normals2 and com-pute the intersection of the ray with the target mesh.We found a fixed length of ENS0 · 0.3125 for thecasted rays, where ENS0 is the MPEG-4 FAPUdefining the distance between the eye and nose.

2 The normal of a vertex is considered as the average of thenormals of the faces the vertex is part of.

By doing this, we are able to know for eachmovable vertex of the source face Pi, the corre-sponding intersection point Qi on the target surfacemesh. Having chosen a fixed length, only a part ofthe movable vertices will have a correspondingpoint on the target surface. This is because, afterthe initial rough fit, only the nearest vertices willbe close enough to the target mesh surface to per-mit a ray-facet intersection. We also experimentedusing outward rays with unlimited length. Weachieved better results with the fixed ray length,probably because in this way the interpolated sur-face smoothly stick to the target surface withoutinserting high-frequency elements.

We consider the first 125 moving vertices havinggreatest error ei = kQi � Pik and we insert the pair(Pi,Qi) in the correspondence set. If a point Pi isalready in the set, then only the position of the cor-respondent Qi is updated. We solve again the linearsystem Eqs. (6)–(10) to find G(P) for the enrichedcorrespondence set and we re-run the scattered datainterpolation algorithm to update the whole sourceface model from its neutral state. This process isiterated until no more movable vertices are insertedin the correspondence set. This may happen for twodifferent reasons:

• all the movable vertices have been inserted in thecorrespondence set;

• the actual set has enough correspondence pairs tofit the source to the target mesh, sufficiently well.

By applying the refined G(P) to all the verticesof the neutral source face mesh, this latter fitsprecisely the target mesh in the area where thereare vertices that potentially move when the ani-mation is performed. We consider only the mov-ing zone because we want to copy the sourceface motion and we are not interested in the stat-ic zones. Fig. 4 shows an example of this itera-tive fitting process for some test models. In thecentral column, the source face is solid whilethe target mesh is wireframe rendered. Step 0 isthe initial rough fit. In Step 1, the eye and thelip features are aligned. After some iterativesteps, the source fits to the target face in themoving zone.

6. Cloning process

As input of the motion cloning process, we needthe scattered data interpolation function G(P) just

Fig. 4. Source shape fitting iterative process.


refined and the morph targets, corresponding to theMPEG-4 FAPs, of the source face. To be precise,we need the 64 MTs corresponding to low-levelFAPs and the 20 MTs corresponding to high-levelFAPs (14 visemes and 6 emotions), for a total of84 MTs. We map the target vertices on thedeformed source mesh through the same projectionused in Section 5.2, this time casting the fixed-lengthrays from the target vertices towards the deformedsource mesh. At this stage of the process, thedeformed source and the target meshes are very sim-ilar to each other and each target vertex can be con-sidered as located where the casted ray intersects theproper triangular face of the source mesh. Thus, wecan compute the barycentric coordinates of eachtarget vertex considering the vertices of the corre-spondent source triangular facet. The position ofthe target vertices can be expressed as a linear com-bination of the positions of the three correspondingsource vertices.

Then, for each particular source MT Si we com-pute the corresponding target MT Ti by applyingthe following algorithm:

Ti = neutral target face (stored in T0);

apply G(P) to Si only in each vertex P 2 Di;

for each vertex Q 2 Ti

if (Pa 2 Di) _ (Pb 2 Di) _ (Pc 2 Di)

Q = Pa Æ b + Pb Æ c + Pc Æ a;

end for;

where Di is the set of moving vertices of Si, Pa, Pb

and Pc are the vertices of the source triangular facetcorresponding to the vertex Q and a, b and c are theproper barycentric coordinates. Note that we applythe scattered data interpolation function G(P) only

to the movable vertices Di of each particular sourceMT Si in order to speed up the cloning process.That is, we apply the same global deformation func-tion G(P) used to fit the source into the target, to thelocal part of the source MTs Si that will move whenSi is employed during the animation. Then, we com-pute the position of the corresponding target verti-ces by linear combination of the barycentriccoordinates with the new positions of the source tri-angular facet obtaining, finally, the resulting targetMT Ti.


As a final step, we clone the global motion of thehead as well as the movements of the tongue andteeth (if present) through affine transformations likein Pandzic [23].

7. Results

We performed our experiments on an Intel Pen-tium M 1.73 GHz processor with 512 MB RAM.Some of the face models that we used for the exper-iments are shown in Fig. 5. The test models havedifferent polygonal resolutions, shapes and connec-tivity, both symmetric as well as asymmetric. A fullset of high- and low-level FAP motions (i.e., morphtargets), was available for each model. All motionwas cloned from model joakim and beta to eachother model, producing the grids of cloned animat-ed models for each motion in Figs. 6–9. Looking ateach row shows how an expression is cloned fromone face to all the other faces; looking at each

Fig. 5. Test models used in our experiments. (a) joakim, 909 vertices367 vertices and 677 faces; (d) kevin, 498 vertices and 956 faces.

column shows how the expression is cloned ontothe same face from different sources.

The comparison in terms of computation timebetween Pandzic’s method [23] and ours is straight-forward, since we use the same input and producethe same kind of output. In Table 1, we presentthe computation time for some cloning processesperformed during our tests along with other inter-esting data. For comparison, the last column pre-sents the computation time for Pandzic’s cloningapproach on the same PC.

To assess the quality of our approach, we clonedthe same source animatable faces to themselves andwe computed the error between the source and theoutput animatable model as the average of the errorof each of the 84 morph targets:

e ¼ 1

84

X84

j¼1

PNvi¼1 vS

i � vTi

�� PNvi¼1 vS

ik k

!� 100 ð11Þ

where Nv is the number of the vertices of the facemesh, vS

i and vTi are, respectively, the i-th vertex of

and 1752 faces; (b) beta, 2197 vertices and 4118 faces; (c) data,

Fig. 6. Cloning grids for expression joy.

Fig. 7. Cloning grids for expression anger.

Fig. 8. Cloning grids for expression surprise.


Fig. 9. Cloning grids for expression sadness.

Table 1MVs, moving vertices; CPs, correspondence points; Is, iterations to refine; RT, G(P) refinement time; CT, cloning time; TT, total time

MVs CPs Is RT (s) CT (s) TT (s) TT_Pzc [23]

beta) data 396 377 8 4.0 1.0 5.0 5.3 sdata) beta 280 280 5 1.3 3.6 5.0 12.4 sjoakim) data 479 461 8 4.4 1.0 5.4 4.5 sdata) joakim 280 275 4 1.0 1.2 2.3 14.1 sbeta) kevin 396 382 9 4.6 0.8 5.4 9.1 skevin) beta 294 290 5 10.6 0.6 11.3 13.6


the source and the corresponding one on the clonedoutput model. Table 2 presents the obtained values.Fig. 10 shows the error distribution over the testmodels.

While differences between our visual results and[23,20] are subjective, we still can consider differenc-es in implementation:

• Noh uses RBF, together with neural networks,to align the source to the target face and thenthe motion vectors for each target vertex arecomputed locally using proper affine transfor-mations. In our approach, we need the RBFmethod G(P) to align the source and the targetwith an iterative enrichment of the correspon-dence set. Then, G(P) is reused to deform thesource MTs and copy the motion on the target

Table 2Average error on cloning source faces with themselves

source) source

beta 0.028%data 0.280%joakim 0.084%kevin 0.141%

face. The fact that the cloning is carried outonly where needed make this process computa-tionally light and probably faster than Noh’sapproach. However, Noh’s approach is almostfully automatic, while we still need to manuallymap 48 FDPs;

• our approach provide a MPEG-4 FBA compli-ant talking face able to perform generic anima-tion, while Noh’s approach is able to clone ananimation given a priori.

8. Conclusions

We proposed a method dealing with the problemof reusing existing facial motion to produce in ashort amount of time a ready-to-be-animatedMPEG-4 FBA compliant talking head. Apart froman initial manual picking of 48 correspondencepoints all the techniques presented here are fullyautomatic. In terms of visual results shown, eventhough only a small subset of them could be pre-sented here, we believe that most facial movementsfor expression and low-level FAPs are copied cor-rectly to the target face models.

Fig. 10. Visual distribution of the error. Error magnitude is proportional to the color brightness.


One limitation of our method might be that thetarget mesh cannot be of higher resolution thanthe source otherwise the barycentric coordinates willjust pull any vertices in the interior of that controltriangle onto the plane of that triangle. However,using a high-resolution source should not causemajor problems. In Figs. 6–9, higher-resolutionmodels joakim and beta are cloned to lower res-olution models data and kevin. The main compu-tational effort during the cloning process lies in therefinement process of the G(P) interpolation func-tion. This is because each pair of correspondencepoints corresponds to a linear equation in the sys-tem to resolve in order to obtain G(P). The asymp-totic behavior of the linear equation-solvingalgorithm (LU decomposition) is O(n3), where n isthe number of moving vertices of the source face.Since the correspondences can be very close to eachother, we think that not all of them are necessaryand, as a future work, we would identify and notconsider the less significant ones. Furthermore, wewould apply face feature tracking to the polygonal

meshes in order to retrieve the FDPs making thewhole process fully automatic.

After the cloning process is finished, the targetface is ready to perform generic facial animationencoded into a MPEG-4 FBA stream. The compu-tational cost of the animation depends on the playeremployed. In our case, we used the lightweightMPEG-4 FBA player provided by [2] in which theanimation is achieved, for each frame, through line-ar interpolation between the proper morph targetsand rendered in OpenGL. Thus, the computationalcost is rather low. Combined with facial featuretracking, MPEG-4 FBA talking heads can poten-tially be used for a very low bitrate visual communi-cation in a model-based coding scenario [7,22](teleconferencing, games, web interfaces).

Acknowledgments

The authors thank Prof. Igor Pandzic and Dr.Jorghen Ahlberg for fruitful discussions. This workwas supported by the European Union Research


Training Network MUHCI (HPRN-CT-2000-00111).

Appendix A. Supplementary data

Supplementary data associated with this articlecan be found, in the online version, atdoi:10.1016/j.gmod.2006.09.006.

References

[1] Moving Pictures Expert Group, MPEG-4 Internationalstandard, ISO/IEC 14496, <http://www.cselt.it/mpeg/>.

[2] Visage Technologies AB, URL: <http://www.visagetechnol-ogies.com/>, March 2006.

[3] J. Ahlberg, Model-based coding—Extraction, coding andevaluation of face model parameters. Ph.d. thesis no. 761,Department of Electrical Engineering, Linkoping University,Linkoping, Sweden, September 2002.

[4] J. Ahlberg, F. Dornaika, Efficient active appearance modelfor real-time head and facial feature tracking, in: IEEEInternational Workshop on Analysis and Modelling of Facesand Gestures (AMFG), Nice, October 2003.

[5] V. Blanz, C. Basso, T. Poggio, T. Vetter. Reanimating facesin images and video, in: P. Brunet, D. Fellner (Eds.),Proceedings of EUROGRAPHICS, Granada, Spain, 2003.

[6] V. Blanz, T. Vetter, A morphable model for the synthesis of3d faces, in: SIGGRAPH’99 Conference Proceedings, 1999.

[7] T.K. Capin, E. Petajan, J. Ostermann, Very low bitratecoding of virtual human animation in MPEG-4, in: IEEEInternational Conference on Multimedia and Expo (II),2000, pp. 1107–1110.

[8] M. Escher, N. Magnenat-Thalmann, Automatic 3D cloningand real-time animation of a human face, in: ComputerAnimation, Geneva, Switzerland, June 1997.

[9] M. Escher, I.S. Pandzic, N. Magnenat-Thalmann, Facialdeformations for MPEG-4, in: Computer Animation 98,IEEE Computer Society Press, Philadelphia, USA, 1998, pp.138–145.

[10] S. Fang, R. Raghavan, J. Richtsmeier, Volume morphingmethods for landmark based 3D image deformation, in:SPIE International Symposium on Medical Imaging, vol.2710, Newport Beach, CA, 1996, pp. 404–415.

[11] M. Fratarcangeli, M. Schaerf, Realistic modeling of anim-atable faces in MPEG-4, in: Computer Animation and SocialAgents, MIRALAB, Computer Graphics Society (CGS),Geneva, Switzerland, 2004, pp. 285–297.

[12] Marco Fratarcangeli, Physically based synthesis of animat-able face models, in: Proceedings of the Second InternationalWorkshop on Virtual Reality and Physical Simulation(VRIPHYS05), Pisa, Italy, ISTI-CNR, The EurographicsAssociation, 2005, pp. 32–39.

[13] T. Goto, M. Escher, C. Zanardi, N. Magnenat-Thalmann,MPEG-4 based animation with face feature tracking, in:Computer Animation and Simulation ’99, 1999, pp. 89–98.

[14] K. Kahler, J. Haber, H.-P. Seidel, Geometry-based musclemodeling for facial animation, in: Proceedings GraphicsInterface 2001 (GI-2001), Morgan Kaufmann, Ottawa,Ontario, 2001, pp. 37–46.

[15] K. Kahler, J. Haber, H. Yamauchi, H.-P. Seidel, Head shop:Generating animated head models with anatomical struc-ture, in: Proceedings of the 2002 ACM SIGGRAPHSymposium on Computer Animation, San Antonio, USA,ACM SIGGRAPH, 2002, pp. 55–64.

[16] W.-S. Lee, M. Escher, G. Sannier, N. Magnenat-Thalmann,MPEG-4 compatible faces from orthogonal photos, in:Proceedings of the Computer Animation, 1999, pp. 186–194.

[17] Y. Lee, D. Terzopoulos, K. Waters, Constructing physics-based facial models of individuals, in: Proceedings of theGraphics Interface ’93 Conference, Toronto, ON, Canada,1993, pp. 1–8.

[18] Y. Lee, D. Terzopoulos, K. Waters, Realistic modeling forfacial animation, in: Computer Graphics Proceedings, SIG-GRAPH 95, Annual Conference Series, Los Angeles, CA,ACM SIGGRAPH, 1995, pp. 55–62.

[19] C.A. Micchelli, Interpolation of scattered data: distancematrices and conditionally positive definite functions, Con-structive Approximations 2 (1986) 11–22.

[20] J. Noh, U. Neumann, Expression cloning, in: SIGGRAPH,ACM SIGGRAPH, 2001, pp. 277–288.

[21] J. Ostermann, Animation of synthetic faces in MPEG-4,in: Computer Animation, Philadelphia, Pennsylvania, 1998,pp. 49–51.

[22] I.S. Pandzic, Facial animation framework for the web andmobile platforms, in: Web3D Symposium, Tempe, AZ,USA, 2002.

[23] I.S. Pandzic, Facial motion cloning, Graphical ModelsJournal, Elsevier 65 (6) (2003) 385–404.

[24] I.S. Pandzic, R. Forchheimer, MPEG-4 Facial Animation—The Standard, Implementation and Applications, first ed.,John Wiley & Sons, LTD Linkoping, Sweden, 2002.

[25] F. Parke, Computer generated animation of faces, in: ACMNational Conference, ACM, 1972, pp. 451–457.

[26] K. Waters, A muscle model for animating three-dimensionalfacial expressions, in: Proceedings of Siggraph, vol. 21, 1987,pp. 17–24.

http://dx.doi.org/10.1016/j.gmod.2006.09.006

http://www.cselt.it/mpeg

http://www.visagetechnologies.com

http://www.visagetechnologies.com

Documents

Facial motion cloning with radial basis functions in MPEG-4 FBAmarcof/pdf/gmod07.fratarcangeli.pdfMPEG-4 Facial Deﬁnition Parameters (or Feature Points) are used to deﬁne the shape