Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

1/40

$3+A$

Many-to-many feature point correspondence establishment for

region-based facial expression cloning

dgA ( Xing, Qing)


2/40

%%il&

4]j\0A

@/:

f&h

@/6xa>+A$

Many-to-many feature point correspondence

establishment for region-based facial

expression cloning


3/40

Many-to-many feature point correspondence

establishment for region-based facial

expression cloning

Advisor : Professor Shin, Sung Yong

by

Xing, Qing

Department of Electrical Engineering and Computer Science

Division of Computer Science

Korea Advanced Institute of Science and Technology

A thesis submitted to the faculty of the Korea Advanced Institute

of Science and Technology in partial fullfillment of the requirementsfor the degree of Master of Engineering in the Department of Electri-

cal Engineering and Computer Science Division of Computer Science

Daejeon, Korea

2006. 6. 16.

Approved by

Professor Shin, Sung Yong

Advisor


4/40

%%il&4]j\0A@/:f&h

@/6xa>+A$

dgA

0A 7HHr DG


5/40

MCS

20044365

??. Xing Qing. Many-to-many feature point correspondence establishment

for region-based facial expression cloning. **9M+i;a660

m;?+8 60BM34 0F). Department of Electrical Engineering and Com-

puter Science Division of Computer Science . 2006. 30p. Advisor Prof. Shin,

Sung Yong. Text in English.

Abstract

In this thesis, we propose a method to establish a many-to-many feature point correspon-

dence for region-based facial expression cloning. By exploiting the movement coherency of

feature points, we first construct a many-to-many feature point matching cross source and

target face models. Then we extract super nodes from the relationship of source and target

feature points. Source super nodes that show a strong movement coherency are grouped

into concrete regions. The source face region segmentation result is transferred to the target

face via the one-to-one super node correspondence. After we obtain corresponding regions

on source and target faces, we classify face mesh vertices to different regions for later fa-

cial animation cloning. Since our method reveals the natural many-to-many feature point

correspondence on source and target faces, each region is adaptively sampled by varying

number of feature points. Hence the region segmentation result can preserve more mesh

deformation information.

i


6/40

Contents

Abstract i

Contents iii

List of Tables v

List of Figures vi

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Related Works 6

3 Region Segmentation 8

3.1 Many-to-many Feature Point Matching . . . . . . . . . . . . . . . . . . . . 8

3.2 Super Node Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3 Source Super Node Grouping . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4 Region Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.5 Vertex Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Experimental Results 19

4.1 Key Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Key Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3 Cloning Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Conclusion 26

Summary (in Korean) 27

iii


7/40

References 28

iv


8/40

List of Tables

4.1 Key model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 Self-cloning Errors for Man . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.3 Performance comparison with Park et al.s . . . . . . . . . . . . . . . . . . 24

v


9/40

List of Figures

1.1 Source and target regions may have different density of feature points. The

red dots on the source mouth region are matched with the four dots on the

target mouth region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Facial expression cloning system overview . . . . . . . . . . . . . . . . . . 4

1.3 Region segmentation operations . . . . . . . . . . . . . . . . . . . . . . . 4

3.1 Extracted feature points on the source and target faces. . . . . . . . . . . . 9

3.2 Feature point relationship graph. Each connected component in this graph

has a pair of corresponding source and target super nodes. . . . . . . . . . . 11

3.3 Feature point matching results comparison. Black dots are unmatched fea-

ture points. (a) one-to-one matching result in [16], (b) our many-to-many

matching result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.4 Region segmentation result compared with [16]. (a) Parks result (b) Our

result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.5 Vertex-region coherency is defined as the maximum of vertex-feature point

coherencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.6 (a) Partition of feature points (b) Vertex classification result . . . . . . . . 18

4.1 Face models. (a) Man (b) Roney (c) Gorilla (d) Cartoon . . . . . . . . . 19

4.2 14 key models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Comparison of feature point matching results. . . . . . . . . . . . . . . . . 22

4.4 Region segmentation results on other face models. . . . . . . . . . . . . . . 234.5 Expression cloning results . . . . . . . . . . . . . . . . . . . . . . . . . . 25

vi


10/40

1. Introduction

1.1 Motivation

Computer animated characters are now indispensable components of computer games, movies,

web pages, and various human computer interface designs. To make these animated virtual

characters lively and convincing, people need to produce realistic facial expressions whichplay the most important role in delivering emotions. Traditionally, facial animation has

been produced largely by keyframe techniques. Skilled artists manually sculpt keyframe

faces very two or three frames for an animation consisting of tens of thousands of frames.

Furthermore, they have to repeat a similar operation on different faces. Although it guaran-

tees the best quality animation, this process is painstaking and tedious for artists and costly

for animation producers. While large studios or production houses can afford to hire hun-

dreds of animators to make feature films, it is not feasible for low budget or interactive

applications.

As facial animation libraries are becoming rich, the reuse of animation data has been a

recurring issue. Noh and Neumann[14] presented a data-driven approach to facial animation

known as expression cloning which transferred a source models facial expressions in an

input animation to a target face model. They first computed displacement vectors for source

vertices. Then they modified the vectors based on 3D morphing between the source and

target face meshes. The modified vectors were applied to target vertices to deform the

target neutral face mesh. This method works well only when the source and target faces

share similar topology and the displacement of vertices from the neutral mesh is small.

Pyun et al. [20] proposed a blend shape approach based on scattered data interpolation.

Given source key models and their corresponding target key models, the face model at each

frame of an input animation is expressed as a weighted sum of source key models, and the

weight values of source key models are applied to blend the corresponding target key mod-

els to obtain the face model at the same frame of the output animation. This approach is

computationally stable and efficient, as pointed out in [13]. What makes it more attractive is

1


11/40

that blend shape approach doesnt restrict source and target face models have similar topol-

ogy, or the source animation have small deformations from the neutral face. This versatility

can not be achieved by physics based or morph based approaches. By providing source

and target key models and assigning the correspondence, the animator can incorporate her

creativity into the output animation. However, in [20] they blended each key model as a

whole entity. So the number of key models grows combinatorially as the number of facial

aspects such as emotions, phonemes and facial gestures increases.

Park et al.s feature-based approach [15] is an extension of Pyun et al.s work and pushed

the utilization of example data to the extreme. Later their region-based approach [16] auto-

mated the process of region segmentation which greatly reduced an artists work load. By

analyzing source and target key models, their system automatically segment the source and

target faces into corresponding regions. They apply the blend shape scheme [20] to each of

the regions separately and composite the results to get a seamless output face model. The

resulting animation preserves the facial expressions of the source face as well as the charac-

teristic features of the target examples. With a small number of key models, the region-based

approach can generate diverse facial expressions like asymmetric gestures while enjoying

the inherent advantages of blend shape approach.

In the region-based approach [16], its critical to segment source and target face models

into coherently moving regions. The corresponding source and target regions must have a

strong correlation such that similar source and target expressions can be generated by using

the same set of blending weights. In the previous work [16], they assumed the cross-model

feature point correspondence is a one-to-one mapping. They transferred the source feature

point grouping result to the target face via this one-to-one source and target feature point

mapping. Thus, the corresponding source and target regions were sampled by the same

number of feature points. However, source and target face models may have significantly

different geometric characteristics or mesh resolutions. So for regions containing the samefacial feature, the density of feature points varies dramatically.

As illustrated in Figure 1.1, lets assume the source face has a big mouth or the mesh

in the mouth region has many fine details. Hence, we extract a lot of feature points in the

source mouth region. On the other hand, the target face doesnt have many mesh details in

the mouth region. We extract only four feature points. Under the assumption of one-to-one

2


12/40

source face model target face model

one-to-one matching

[Park and Shin 06]

Figure 1.1: Source and target regions may have different density of feature points. The red

dots on the source mouth region are matched with the four dots on the target mouth region.

correspondence, only four source feature points are matched with the four target feature

points. Thus the source mouth region is sampled by only four feature points. A great deal

of information is lost. Its the same case for the left eye region on the source and target

faces.

Our observation is that one source (target) feature point doesnt necessarily moves co-

herently with only one target (source) feature point. It actually moves similarly with a

small group of feature points. We propose to establish a many-to-many cross-model feature

point correspondence. Under this correspondence, corresponding source and target regions

are adaptively sampled by different number of feature points. Our segmentation method

maximizes the correlation between the source and target regions.

1.2 Overview

Following the framework of region-based approach [16], our facial expression cloning sys-

tem consists of two parts, analysis and synthesis, as illustrated in Figure 1.2. Our contribu-

tion lies in the region segmentation part. Figure 1.3 shows the sequence of operations in the

region segmentation module.

The analysis part is preprocessing which is performed only once. Regarding a face

mesh as a mass-spring network, the system automatically picks feature points on source

and target faces. Feature points are defined as vertices which have local maximum spring

3


13/40

feature point

extraction

source key models

target key models

region

segmentationparameterization

analysis

parameter

extractioninput

animation

key shape

blending

region

composition

synthesis

output

animation

Figure 1.2: Facial expression cloning system overview

region segmentation

many-to-many

feature point matching

super node

extraction

source super

node grouping

region

transfer

vertex

classification

Figure 1.3: Region segmentation operations

4


14/40

potential energies. We establish a many-to-many correspondence between source and tar-

get feature points through running the hospitals/residents algorithm [7] in two directions.

Since a small group of source feature points move coherently with a small group of target

feature points, we name coherently moving source and target feature point groups as super

nodes. The one-to-one super node correspondence is set up by finding connected compo-

nents in a graph which embodies the source and target feature point relationship. Similar

to Parks [16] feature point grouping, we group source super nodes into concrete regions.

The source region segmentation result is easily transferred to the target face via the super

node correspondence. Now we are ready to classify every vertex to regions according to the

vertex-region correlation. The last preprocessing step is to place each region in the param-

eter space, which is a standard technique for blend shape-based facial expression cloning.

Readers can refer Pyns paper [20] for technical details.

The task in the synthesis part is to transfer expressions from the source face to the target

face at runtime. There are three steps in this part, parameter extraction, key shape blending,

and region composition. Parks work [16]has treated this part rather well. Hence, we use

their techniques directly.

The remainder of this thesis is organized as follows: In Chapter 2, we review related

works. Chapter 3 describes our region segmentation method in detail. We show experi-

mental results in Chapter 4. Finally, we conclude our work and suggest future research in

Chapter 5.

5


15/40

2. Related Works

Realistic facial animation remains a fundamental challenge in computer graphics. Begin-

ning with Parkes pioneering work [17], extensive research has been dedicated to this field.

Williams [23] first proposed a performance-driven facial animation. Noh and Neumann

[14] addressed the problem of facial expression cloning to reuse facial animation data. In

fact, performance-driven animation can be regarded as a type of expression cloning froman input image sequence to a 3D face model. We focus on recent results closely related to

facial expression cloning besides those already mentioned in Chapter 1. An comprehensive

overview can be found in the well known facial animation book by Parke and Waters [18].

Blend shape scheme: Following Williams work, there have been many approaches

in performance-driven animation [10, 19, 2, 6, 4, 11, 1, 5, 3]. For our purposes, the most

notable are blend shape approaches [10, 19, 2, 6, 1, 5, 3], in which a set of example models

are blended to obtain an output model. In general, the blending weights are computed by

least squares fitting [10, 19, 2, 6, 1, 5, 3]. From the observation that the deformation space of

a face model is well approximated by a low-dimensional linear space, a series of research

results on facial expression cloning have been presented based on a blend shape scheme

with scattered data interpolation [20, 13, 15, 16]. The favorable advantages are stated in

Chapter 1.

Region segmentation: While being robust and efficient, the main difficulty of blend

shape approaches is an exponential growth rate of the number of key models with respect

to the number of facial attributes. Kleiser [9] applied a blend shape scheme to manually-

segmented regions and then combined the results to synthesis a facial animation. Joshi et

al. [8] automatically segmented a single face model based on a deformation map. Inspired

by these approaches, Park et al. [15] proposed a method for segmenting a face model into

a predefined number of regions, provided with a set of feature points manually specified on

each face feature. The idea was to classify vertices into regions, each containing a face fea-

ture, according to the movement coherency of each vertex with respect to the feature points

in each region. Park et al. [16] further explored this idea to automate the whole process of

6


16/40

feature-based expression cloning, which greatly reduces the animators burden. They ad-

dressed three issues of automatic processing: the extraction, correspondence establishment,

and grouping of the feature points on source and target face models.

Multi-linear model: Vlasic et al. [22] proposed a method based on a multi-linear hu-

man face model to map video-recorded performances of one individual to facial animations

of another. This method is a generalization of blend shape approach and thus can be trivially

adapted to facial expression cloning. As a general data-driven tool, a reasonable multi-linear

model requires a large number of face models with different attributes. Moreover, the multi-

linear model is not quite adequate to address specific issues in facial expression cloning such

as asymmetric facial gestures and topological independence between source and target face

models.

Mesh deformation transfer: Sumner and Popovi [21] proposed to transfer the defor-

mation of a triangular mesh to another triangular mesh. This method can also be applied to

facial expression cloning. Unlike blend shape approaches [20, 13, 15, 16], the method does

not require any key models besides a source and a target face mesh. Instead, the animator

manually provides facial feature points and their correspondence between source and target

models. Without using key face models, however, it is hard to incorporate the animators

intention into the output animation. Another limitation is that the source and target models

should share the same topology although their meshes may be different in both vertex count

and connectivity.

7


17/40

3. Region Segmentation

In this chapter, we explain in detail our new method to segment the source and target faces

into corresponding regions which will be synthesized individually at runtime. We first estab-

lish a many-to-many correspondence between source and target feature points by analyzing

their movement coherence. Then we extract super nodes from the relationship graph of the

source and target feature points. Next we segment the source face into regions by groupingsource super nodes. Since the source and target super nodes have one-to-one correspon-

dence, the grouping result is transferred from the source face to the target face. Finally

every vertex of the face mesh is classified to one or more regions according to its movement

coherence with the regions. In general, a face model is symmetrical with respect to the ver-

tical bisecting plane. Assuming that both halves of the face model have similar deformation

capability, we only do analysis on a halve face and reflect the regions to the other half face.

3.1 Many-to-many Feature Point Matching

We extract feature points on the source and target faces by using the method in [16]. Figure

3.1 shows the feature points extracted on the source and target faces. We want to find their

correspondence so that corresponding feature points move similarly when the source and

target faces show the same expression.

From a single source feature points view, it does not necessarily move similarly as

only one target feature point. Rather it moves similarly as a small group of target feature

points. The situation is the same for a target feature point. So we set up this many-to-many

correspondence in two steps. In the first step, we find corresponding target feature points

for every source feature point. In the second step, we find corresponding source feature

points for every target feature point. The two steps are symmetric. We mainly describe the

first step as follows.

We use the equation proposed in [16] to measure the movement coherency cjk for a

source feature point vj and a target feature point vk.

8


18/40

Figure 3.1: Extracted feature points on the source and target faces.

cjk =

1

N

N1

i=0

sijk

w1

1

N

N1

i=0

ijk

w2

d0jkw3

(3.1)

where

sijk =

1 ifvij = v0j and v

ik = v

0k

1abs( vijv0jvikv0k )

max{ vijv0j, vikv0k }

otherwise

ijk =

1 ifvij = v0j and v

ik = v

0k

0 ifvij = v0j or v

ik = v

0k (but not both)

max

vijv

0j

vijv0j v

ikv

0k

vikv0k

, 0

otherwise

d0jk = max

1

v0k v0j

D, 0

.

9


19/40

Here, N is the number of source key models. vij denotes the vertex js 3D position in

key model i. Key model 0 is the neutral face which is regarded as the base model. wl ,

l=1,2,3 is the weight for each multiplicative term. The user can adjust these parameters.

We empirically set w1 = 21, w2 = 2

0, and w3 = 22. We define D = maxDS,DT. DS

is the minimum Euclidean distance such that the source vertices form a single connected

component when we connect every pair of vertices whose Euclidean distance is not greater

than DS. DT can be obtained by binary search, starting from the maximum distance over all

pairs of vertices.

Intuitively, sijk measures the similarity of moving speeds of vertices, vij and v

ik.

ijk gives

the similarity of moving directions. d0jk measures the geometrical proximity of the pair

of vertices, v0j and v0k in the base face model (key model 0) that correspond to v

ij and v

ik,

respectively. Note that every term takes on a value between zero and one, inclusively. Thus,

the movement coherency cjk also takes on a value in the same range.

For every source (target) feature point, a preference list of target (source) feature points

is made by interpreting the movement coherency value cjk as the preference that source

feature point j and target feature point k have for each other. Now the problem of finding

corresponding target feature points for every source feature point can be reduced to the

hospitals/residents problem with ties [7]. Here we consider every source feature point as

a hospital and every target feature point as a resident. We set the number of available

posts of every hospital to be 5% of the total number of residents. The algorithm [7] can

determine whether a given instance of hospitals/residents problem with ties admits a super-

stable matching, and construct such a matching if it exists. Let m and n be the number of

hospitals and residents respectively. The algorithm is O(mn) time - linear in the size of the

problem instance. If the algorithm reports there is not a super stable matching, we break

the ties according the feature points geometrical distance di j and run the algorithm againto find a weak stable matching.

In the second step, we treat target feature points as hospitals and source feature points as

residents. After running the algorithm again, every target feature point gets its correspond-

ing source feature points.

We construct an undirected bipartite graph G = (VF,EF). Vertex set VF consists of

10


20/40

source feature points target feature points

target

super node

source

super node

Figure 3.2: Feature point relationship graph. Each connected component in this graph has

a pair of corresponding source and target super nodes.

11


21/40

source and target feature points. A source and a target feature points are connected by an

edge if they are matched either in the first step or the second step. If a feature point is

not matched to any other feature points, we do not include it in the graph. We mark it as

unmatched and deal with it in a later processing stage. The bipartite graph in Figure 3.2

embodies the many-to-many correspondence between source and target feature points.

3.2 Super Node Extraction

From the definition of movement coherency equation 3.1, we can say that if several source

feature points move coherently with one target feature, then these source feature points must

also have high movement coherencies. We find all connected components in the undirected

bipartite graph G. This problem can be solved in O(|V|+ |E|) time using a standard graph

algorithm [12]. Each connected component has a group of source feature points and a

group of target feature points. We define the two groups as corresponding super nodes on

the source and target face models, shown in 3.2. They have two properties: first, all the

feature points in a super node move coherently; second, the feature points in a source super

node move coherently with the feature points in the corresponding target super node. By

extracting super nodes from the relationship graph, we convert the many-to-many feature

point matching to the one-to-one super node matching. We compare our matching result

with that in Parks previous work in Figure 3.3. In [16], many feature points are unmatched

on the source face. Hence a large amount of information about the source mesh detail is

lost.

3.3 Source Super Node Grouping

Similar to the idea of feature point grouping in [16], we group source super nodes into

regions. The underlying idea is to partition the face mesh into meaningful regions such that

each region contains a facial feature like an eye, the mouth, or a cheek. By using Equation

?? with slight modifications, we can compute the movement coherency cjk of two vertices

vj and vk on the same face.

Then we define the movement coherency between two source super nodes sp and sq as

12


22/40

(a)

(b)

Figure 3.3: Feature point matching results comparison. Black dots are unmatched feature

points. (a) one-to-one matching result in [16], (b) our many-to-many matching result.

13


23/40

below:

Cpq =1

|Ip| |Iq|

jIp

kIq

cjk (3.2)

Here Ip and Iq are index sets of feature points in source super nodes sp and sq. Ip = {j |

feature point vj belongs to super node sp}. Iq = {k | feature point vk belongs to super node

sq}. |Ip| and |Iq| are the numbers of feature points in super nodes sp and sq.

Our assumption is that if two super nodes have a high movement coherency, they belong

to the same region. We construct an undirected graph GS = (VS,ES), where VS is the source

super node set. A pair of super nodes are connected by an edge in ES if their movement

coherency is greater than or equal to a given threshold . The problem of source super node

grouping is reduced to finding connected components in graph GS. The user can change

the value to control the number of connected components until she thinks the grouping

result is reasonable. There might be some regions which have only one or two feature

points. They are not sampled adequately and will cause artifacts in the output animation.

We remove these outliers by merge their super nodes to other surviving regions.

3.4 Region Transfer

Given the one-to-one correspondence of source and target super nodes, transferring the su-

per node grouping result from the source face to the target face is travail. Suppose source

region SRi (i is the region index) has source super nodes sk, k IRi. IRi is the index set

of super nodes that belong to SRi. Then its counterpart target region T Ri has target super

nodes sj, j IRi. Here, we want to emphasize several points again. First, source and target

faces have the same number of regions. Second, each source region has its corresponding

target region. Third, a pair of corresponding source and target regions have the same num-

ber of source and target super nodes respectively. Last, a pair of corresponding source and

target regions do not have the same number of source and target feature points because cor-

responding source and target super nodes dont necessarily have the same number of feature

points. This is the key strength of our segmentation method presented in this thesis. Since

each region is sampled adaptively by a varying number of feature points, the characteristic

of the region is preserved as much as possible. Remember that a few feature points are

14


24/40

unmatched after running the hospitals/residents algorithm. Now we classify each of them

into the region which has the largest coherency with it. Figure 3.4 compares our region seg-

mentation result with that in the previous work [16]. In the previous work, corresponding

regions are sampled with the same number of feature points.

3.5 Vertex Classification

Now we are ready to classify all vertices of the face mesh into (possibly overlapping) regions

by exploiting the movement coherency of one vertex with respect to one region. Specifi-

cally, we choose the vertex-region coherency cjFl as the maximum of coherencies between

the vertex vj and the feature points vk contained in the region Fl , see Figure 3.5. That is,

cjFl = maxkIFl

{cjk}, (3.3)

where IFl is the index set for the feature points in Fl .

A vertex is classified into a region if their coherency is greater than or equal to a thresh-

old value . Note that each vertex can be classified into two or more regions. It is necessary

to have regions overlap on the boundary to get a seamless output animation on the targetface. The user can tone the value to control how much regions overlap. Figure 3.6 gives

the vertex classification result.

15


25/40

(a)

(b)

Figure 3.4: Region segmentation result compared with [16]. (a) Parks result (b) Our result

16


26/40

vertexjv

regionlF

kvfeature point

Figure 3.5: Vertex-region coherency is defined as the maximum of vertex-feature point

coherencies.

17


27/40

(a)

(b)

Figure 3.6: (a) Partition of feature points (b) Vertex classification result

18


28/40

4. Experimental Results

We carried out several sets of experiments to verify the new region segmentation method

proposed in this thesis. The facial expression cloning system was implemented with C++

and OpenGL. We performed our experiments on an Intel Pentium PC (P4 3.0GHz proces-

sor, 2GB RAM, and NVIDIA GeForce FX 5950 Ultra). All the computation was done on

CPU. Experimental details and data are shown and analyzed in this chapter.

4.1 Key Model Specification

To show the versatility of our region-based approach, we use various face models with dif-

ferent geometric property, topology, and dimension. Figure 4.1 shows the four face models

we used. The numbers of vertices and polygons in each face model, together with the num-

ber of used key models are listed in Table 4.1. Note that the Cartoon face model is a 2D

mesh while others are 3D meshes.

As illustrated in Figure 4.2, we used fourteen key models for expression cloning: one

face model with a neutral expression, six key models for emotional expressions, and seven

key models for verbal expressions. Each source key model has a corresponding target key

model. The neutral source and target face models are also called the source and target base

(a) (b) (c) (d)

Figure 4.1: Face models. (a) Man (b) Roney (c) Gorilla (d) Cartoon

19


29/40

Table 4.1: Key model specification

appearance # vertices # polygons # key models

Man Figure 4.1 (a) 1839 3534 14

Roney Figure 4.1 (b) 5614 10728 14

Gorilla Figure 4.1 (d) 4160 8266 7

Cartoon Figure 4.1 (f) 902 1728 7

neutral, joy, surprise, anger, sadness, disgust, and sleepiness.

Figure 4.2: 14 key models.

20


30/40

face models respectively. The source face models share the same mesh configuration. So

do the target face models. However, the source and target face models, in general, may have

different mesh configuration and even different topology. For Gorilla and Cartoon, we use

even less key models and still can get satisfactory results.

4.2 Key Model Analysis

In this section, we show our key model analysis results. We used the same value for

feature point correspondence establishment, super node grouping and vertex classification.

For movement coherency, we set w1 = 21, w2 = 2

0, and w3 = 22 for Equation 3.1.

Figure 4.3 comprises our feature point matching results with Part et als [16]. Our

method makes use of larger percentages of original feature points. Region segmentation

results on more face models are shown in Figure 1.1. By using different threshold, the user

can control the number of regions. We get reasonable region segmentation by trial and error.

4.3 Cloning Errors

We perform two sets of experiments and compare the results from previous work [16] to

verify our segmentation methods can achieve more accuracy. To measure the accuracy,

self-cloning was done for the face model Man in two ways, direct self-cloning (from Man

to Man) and indirect self-cloning (first from Man to X and then from X back to Man). The

cloning error is measured as follows:

=nj=1 xj x

j

nj=1 xj, (4.1)

where xj and x

j are the original and cloned 3D positions of a vertex vj of the face model

Man, and n is the number of vertices in the model. For comparison with Parks work [16],

we use the same sets of models. Roney, Gorilla, and Cartoon were used as the intermediate

models. The input animation for Man consists of 1389 frames. The results were collected

in Table 4.2 and Table 4.3.

The next set of experiments was conducted to demonstrate the visual quality of cloned

animations, where the transfer of asymmetric facial gestures was emphasized. The results

21


31/40

99.187%99.213%80.488%77.593%percentage

1221269999matched

123127123127original

many-to-manyone-to-one# feature points

Man to Roney

77.297%88.976%42.1628%61.417%percentage

1431137878matched

185127185127original


Man to Gorilla

95.238%73.228%74.603%37.008%percentage

60934747matched

6312763127original


Man to Cartoon

Figure 4.3: Comparison of feature point matching results.

22


32/40

Man to Gorilla

Man to Cartoon

Figure 4.4: Region segmentation results on other face models.

23


33/40

Table 4.2: Self-cloning Errors for Man

Types Intermediate face modelErrors (%)

Park et al.s approach [16] Our method

direct self-cloning 0.080 0.060

indirect self-cloningRoney 0.223 0.195

Gorilla 0.250 0.214

Cartoon 0.167 0.150

Table 4.3: Performance comparison with Park et al.s

Types Approaches ErrorsTime

Key model analysis Run-time transfer

direct cloning Ours 0.060% 3.02 sec. 0.98 msec. (1020)

(Man to Man) Park et al.s 0.080% 3.11 sec. 0.96 msec. (1041)

indirect cloning Ours 0.195% 6.86 sec. 1.38 msec. (725)

(Man to Roney to Man) Park et al.s 0.223% 7.06 sec. 1.36 msec. (735)

( ) : Average frames per second

24


34/40

Figure 4.5: Expression cloning results

are given in the accompanying movie file. Figure 4.5 are snapshots from the animation.

25


35/40

5. Conclusion

In this thesis, we present a new region segmentation method which can be integrated in

Park et als region-based facial expression system [16]. We first establish a natural many-

to-many feature point correspondence cross source and target face models. Then we extract

coherently moving groups of feature points as super nodes to convert the many-to-many

feature point correspondence to a one-to-one super node correspondence. We segment thesource face model by grouping strongly related source super nodes and reflect the result

onto the target face. Our region segmentation methods adaptively samples facial regions

with varying numbers of feature points. Hence the region segmentation result can preserve

more mesh deformation information.

One limitation of this work is that users have to adjust for different face models. In

the future, we also want to incorporate physics to synthesis more realistic facial details.

The facial expression cloning system only focuses on the geometry of face models. There

are many other issues to consider about facial expression cloning, like head motions, gaze

directions, different textures for each key model and so on.

26


36/40

%K

**9M+i;a660m;?+860BM34

0F)

:r7HH\"fH%%il\OJ&4]j\0A@/@/6x:f&hwn\a~

ZO ]j. :f&h s\ fe_ 8 < \OJ 4Sq_ @/ :f&h BgA . < :f&h[t s_ a>' (

\ Kp. fe_ $s B Z}r (H Y> >h_ %%i

o+e.\OJ%%ir+\"f {9@/{9(@/6x\

\OJ5x.:f&hs&&h[tyy_%%i\r.l>r


37/40

References

[1] C. Bregler, L. Loeb, E. Chuang, and H. Deshpande. Turning to the masters: motion

capturing animations. In Proc. of ACM SIGGRAPH, pages 399407, 2002.

[2] I. Buck, A. Finkelstein, and C. Jacobs. Performance-driven hand-drawn animation. In

Symposium on Non-Photorealistic Animation and Rendering, 2000.

[3] Jin Chai, Jing Xiao, and Jessica Hodgins. Vision-based control of 3d facial animation.

In ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 193

206, 2003.

[4] Byoungwon Choe, Hanook Lee, and Hyeongseok Ko. Performance-driven muscle-

based facial animation. Journal of Visualization and Computer Animation, 12(2):67

79, 2001.

[5] Erika Chuang and Chris Bregler. Performance driven facial animation using blend-

shape interpolation. Stanford University Computer Science Technical Report,CS-TR-

2002-02, 2002.

[6] Douglas Fidaleo, Junyong Noh, Reyes Enciso Taeyong Kim, and Ulrich Neumann.

Classification and volume morphing for performance-driven facial animation. In In-

ternational Workshop on Digital and Computational Video, 2000.

[7] Robert W. Irving, David Manlove, and Sandy Scott. The hospitals/residents problem

with ties. In SWAT 00: Proceedings of the 7th Scandinavian Workshop on Algorithm

Theory, pages 259271, London, UK, 2000. Springer-Verlag.

[8] Pushkar Joshi, Wen C. Tien, Mathieu Desbrun, and F. Pighin. Learning controls for

blend shape based realistic facial animation. In ACM SIGGRAPH/Eurographics Sym-

posium on Computer Animation, pages 187192, 2003.

[9] J. Kleiser. A fast, efficient, accurate way to represent the human face. ACM SIG-

GRAPH 89 Course #22 Notes, 1989.

28


38/40

[10] Cyriaque Kouadio, Pierre Poulin, and Pierre Lachapelle. Real-time facial animation

based upon a bank of 3d facial expressions. In Computer Animation, pages 128136,

1998.

[11] I-Chen Lin, Jeng-Sheng Yeh, and Ming Ouhyoung. Realistic 3d facial animation pa-

rameters from mirror-reflected multi-view video. In IEEE Computer Animation, pages

241250, 2001.

[12] K. Mehlhorn. Data Structures and Algorithms, volume 1-3. Springer Publishing

Company, 1984.

[13] K. Na and M. Jung. Hierarchical retargetting of fine facial motions. Computer Graph-

ics Forum, 23(3):687695, 2004.

[14] J. Noh and U. Neumann. Expression cloning. In ACM SIGGRAPH, pages 277288,

2001.

[15] Bongcheol Park, Heejin Chung, Tomoyuki Nishita, and Sung Yong Shin. A feature-

based approach to facial expression cloning. Computer Animation and Virtual Worlds,

16(3-4):291303, 2005.

[16] Bongcheol Park and Sung Yong Shin. A region-based facial expression cloning. Tech-

nical Report CS/TR-2006-256, Korea Advanced Institute of Science and Technology,

2006.

[17] F. I. Parke. Computer generated animation of faces. In ACM National Conference,

pages 451457, 1972.

[18] Frederic I. Parke and Keith Waters. Computer facial animation. A. K. Peters, Ltd.,

Natick, MA, USA, 1996.

[19] F. Pighin, R. Szeliski, and D. H. Salesin. Resynthesizing facial animation through 3d

model-based tracking. In IEEE International Conference on Computer Vision, pages

143150, 1999.

29


39/40

[20] H. Pyun, Y. Kim, W. Chae, H. Y. Kang, and S. Y. Shin. An example-based approach

for facial expression cloning. In ACM SIGGRAPH/Eurographics Symposium on Com-

puter Animation, pages 167176, 2003.

[21] Robert W. Sumner and Jovan Popovic. Deformation transfer for triangle meshes. In

ACM SIGGRAPH, pages 399405, 2004.

[22] Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popovic. Face transfer

with multilinear models. In ACM SIGGRAPH, pages 426433, 2005.

[23] L. Williams. Performance-driven facial animation. Computer Graphics(Proceedings

of SIGGRAPH 90), 24:235242, 1990.

30


40/40

+ ;

I am grateful to numerous people who have helped in many ways throughout this research.

I would like to thank my advisor, Professor Sung Yong Shin for his valuable support,

feedback, and guidance throughout very stage of this research. I appreciate his encourage-

ment for independence while simultaneously providing valuable guidance.

I would also like to thank the masters thesis committee ( Dr. Shin, Dr. Cheong, and Dr.Cordier) for their valuable insights and helpful reviews.

Special thanks go to my mentor, Bongcheol Park for his time, direct guidance, and

consistent patience as I floundered my way through this process.

I would like to express my gratitude to the Institute of Information Technology Assess-

ment for generously sponsoring this research throughout Korean Government IT Scholar-

ship Program.

I am thankful to all members of TC Lab and all my friends for their help, friendship,

and the comfortable working atmosphere.

Last, but far from least, I must thank my parents for their unending support, uncondi-

tional love, and forever blessing. Although I couldnt see them in the last two years, chatting

with them over phone is the best way to cheer me up.

Documents

Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning