Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

Embed Size (px)

Citation preview

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    1/40

    $3+A$

    Many-to-many feature point correspondence establishment for

    region-based facial expression cloning

    dgA ( Xing, Qing)

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    2/40

    %%il&

    4]j\0A

    @/:

    f&h

    @/6xa>+A$

    Many-to-many feature point correspondence

    establishment for region-based facial

    expression cloning

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    3/40

    Many-to-many feature point correspondence

    establishment for region-based facial

    expression cloning

    Advisor : Professor Shin, Sung Yong

    by

    Xing, Qing

    Department of Electrical Engineering and Computer Science

    Division of Computer Science

    Korea Advanced Institute of Science and Technology

    A thesis submitted to the faculty of the Korea Advanced Institute

    of Science and Technology in partial fullfillment of the requirementsfor the degree of Master of Engineering in the Department of Electri-

    cal Engineering and Computer Science Division of Computer Science

    Daejeon, Korea

    2006. 6. 16.

    Approved by

    Professor Shin, Sung Yong

    Advisor

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    4/40

    %%il&4]j\0A@/:f&h

    @/6xa>+A$

    dgA

    0A 7HHr DG

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    5/40

    MCS

    20044365

    ??. Xing Qing. Many-to-many feature point correspondence establishment

    for region-based facial expression cloning. **9M+i;a660

    m;?+8 60BM34 0F). Department of Electrical Engineering and Com-

    puter Science Division of Computer Science . 2006. 30p. Advisor Prof. Shin,

    Sung Yong. Text in English.

    Abstract

    In this thesis, we propose a method to establish a many-to-many feature point correspon-

    dence for region-based facial expression cloning. By exploiting the movement coherency of

    feature points, we first construct a many-to-many feature point matching cross source and

    target face models. Then we extract super nodes from the relationship of source and target

    feature points. Source super nodes that show a strong movement coherency are grouped

    into concrete regions. The source face region segmentation result is transferred to the target

    face via the one-to-one super node correspondence. After we obtain corresponding regions

    on source and target faces, we classify face mesh vertices to different regions for later fa-

    cial animation cloning. Since our method reveals the natural many-to-many feature point

    correspondence on source and target faces, each region is adaptively sampled by varying

    number of feature points. Hence the region segmentation result can preserve more mesh

    deformation information.

    i

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    6/40

    Contents

    Abstract i

    Contents iii

    List of Tables v

    List of Figures vi

    1 Introduction 1

    1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    2 Related Works 6

    3 Region Segmentation 8

    3.1 Many-to-many Feature Point Matching . . . . . . . . . . . . . . . . . . . . 8

    3.2 Super Node Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    3.3 Source Super Node Grouping . . . . . . . . . . . . . . . . . . . . . . . . . 12

    3.4 Region Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.5 Vertex Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4 Experimental Results 19

    4.1 Key Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    4.2 Key Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    4.3 Cloning Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    5 Conclusion 26

    Summary (in Korean) 27

    iii

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    7/40

    References 28

    iv

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    8/40

    List of Tables

    4.1 Key model specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    4.2 Self-cloning Errors for Man . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    4.3 Performance comparison with Park et al.s . . . . . . . . . . . . . . . . . . 24

    v

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    9/40

    List of Figures

    1.1 Source and target regions may have different density of feature points. The

    red dots on the source mouth region are matched with the four dots on the

    target mouth region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.2 Facial expression cloning system overview . . . . . . . . . . . . . . . . . . 4

    1.3 Region segmentation operations . . . . . . . . . . . . . . . . . . . . . . . 4

    3.1 Extracted feature points on the source and target faces. . . . . . . . . . . . 9

    3.2 Feature point relationship graph. Each connected component in this graph

    has a pair of corresponding source and target super nodes. . . . . . . . . . . 11

    3.3 Feature point matching results comparison. Black dots are unmatched fea-

    ture points. (a) one-to-one matching result in [16], (b) our many-to-many

    matching result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    3.4 Region segmentation result compared with [16]. (a) Parks result (b) Our

    result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    3.5 Vertex-region coherency is defined as the maximum of vertex-feature point

    coherencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3.6 (a) Partition of feature points (b) Vertex classification result . . . . . . . . 18

    4.1 Face models. (a) Man (b) Roney (c) Gorilla (d) Cartoon . . . . . . . . . 19

    4.2 14 key models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    4.3 Comparison of feature point matching results. . . . . . . . . . . . . . . . . 22

    4.4 Region segmentation results on other face models. . . . . . . . . . . . . . . 234.5 Expression cloning results . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    vi

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    10/40

    1. Introduction

    1.1 Motivation

    Computer animated characters are now indispensable components of computer games, movies,

    web pages, and various human computer interface designs. To make these animated virtual

    characters lively and convincing, people need to produce realistic facial expressions whichplay the most important role in delivering emotions. Traditionally, facial animation has

    been produced largely by keyframe techniques. Skilled artists manually sculpt keyframe

    faces very two or three frames for an animation consisting of tens of thousands of frames.

    Furthermore, they have to repeat a similar operation on different faces. Although it guaran-

    tees the best quality animation, this process is painstaking and tedious for artists and costly

    for animation producers. While large studios or production houses can afford to hire hun-

    dreds of animators to make feature films, it is not feasible for low budget or interactive

    applications.

    As facial animation libraries are becoming rich, the reuse of animation data has been a

    recurring issue. Noh and Neumann[14] presented a data-driven approach to facial animation

    known as expression cloning which transferred a source models facial expressions in an

    input animation to a target face model. They first computed displacement vectors for source

    vertices. Then they modified the vectors based on 3D morphing between the source and

    target face meshes. The modified vectors were applied to target vertices to deform the

    target neutral face mesh. This method works well only when the source and target faces

    share similar topology and the displacement of vertices from the neutral mesh is small.

    Pyun et al. [20] proposed a blend shape approach based on scattered data interpolation.

    Given source key models and their corresponding target key models, the face model at each

    frame of an input animation is expressed as a weighted sum of source key models, and the

    weight values of source key models are applied to blend the corresponding target key mod-

    els to obtain the face model at the same frame of the output animation. This approach is

    computationally stable and efficient, as pointed out in [13]. What makes it more attractive is

    1

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    11/40

    that blend shape approach doesnt restrict source and target face models have similar topol-

    ogy, or the source animation have small deformations from the neutral face. This versatility

    can not be achieved by physics based or morph based approaches. By providing source

    and target key models and assigning the correspondence, the animator can incorporate her

    creativity into the output animation. However, in [20] they blended each key model as a

    whole entity. So the number of key models grows combinatorially as the number of facial

    aspects such as emotions, phonemes and facial gestures increases.

    Park et al.s feature-based approach [15] is an extension of Pyun et al.s work and pushed

    the utilization of example data to the extreme. Later their region-based approach [16] auto-

    mated the process of region segmentation which greatly reduced an artists work load. By

    analyzing source and target key models, their system automatically segment the source and

    target faces into corresponding regions. They apply the blend shape scheme [20] to each of

    the regions separately and composite the results to get a seamless output face model. The

    resulting animation preserves the facial expressions of the source face as well as the charac-

    teristic features of the target examples. With a small number of key models, the region-based

    approach can generate diverse facial expressions like asymmetric gestures while enjoying

    the inherent advantages of blend shape approach.

    In the region-based approach [16], its critical to segment source and target face models

    into coherently moving regions. The corresponding source and target regions must have a

    strong correlation such that similar source and target expressions can be generated by using

    the same set of blending weights. In the previous work [16], they assumed the cross-model

    feature point correspondence is a one-to-one mapping. They transferred the source feature

    point grouping result to the target face via this one-to-one source and target feature point

    mapping. Thus, the corresponding source and target regions were sampled by the same

    number of feature points. However, source and target face models may have significantly

    different geometric characteristics or mesh resolutions. So for regions containing the samefacial feature, the density of feature points varies dramatically.

    As illustrated in Figure 1.1, lets assume the source face has a big mouth or the mesh

    in the mouth region has many fine details. Hence, we extract a lot of feature points in the

    source mouth region. On the other hand, the target face doesnt have many mesh details in

    the mouth region. We extract only four feature points. Under the assumption of one-to-one

    2

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    12/40

    source face model target face model

    one-to-one matching

    [Park and Shin 06]

    Figure 1.1: Source and target regions may have different density of feature points. The red

    dots on the source mouth region are matched with the four dots on the target mouth region.

    correspondence, only four source feature points are matched with the four target feature

    points. Thus the source mouth region is sampled by only four feature points. A great deal

    of information is lost. Its the same case for the left eye region on the source and target

    faces.

    Our observation is that one source (target) feature point doesnt necessarily moves co-

    herently with only one target (source) feature point. It actually moves similarly with a

    small group of feature points. We propose to establish a many-to-many cross-model feature

    point correspondence. Under this correspondence, corresponding source and target regions

    are adaptively sampled by different number of feature points. Our segmentation method

    maximizes the correlation between the source and target regions.

    1.2 Overview

    Following the framework of region-based approach [16], our facial expression cloning sys-

    tem consists of two parts, analysis and synthesis, as illustrated in Figure 1.2. Our contribu-

    tion lies in the region segmentation part. Figure 1.3 shows the sequence of operations in the

    region segmentation module.

    The analysis part is preprocessing which is performed only once. Regarding a face

    mesh as a mass-spring network, the system automatically picks feature points on source

    and target faces. Feature points are defined as vertices which have local maximum spring

    3

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    13/40

    feature point

    extraction

    source key models

    target key models

    region

    segmentationparameterization

    analysis

    parameter

    extractioninput

    animation

    key shape

    blending

    region

    composition

    synthesis

    output

    animation

    Figure 1.2: Facial expression cloning system overview

    region segmentation

    many-to-many

    feature point matching

    super node

    extraction

    source super

    node grouping

    region

    transfer

    vertex

    classification

    Figure 1.3: Region segmentation operations

    4

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    14/40

    potential energies. We establish a many-to-many correspondence between source and tar-

    get feature points through running the hospitals/residents algorithm [7] in two directions.

    Since a small group of source feature points move coherently with a small group of target

    feature points, we name coherently moving source and target feature point groups as super

    nodes. The one-to-one super node correspondence is set up by finding connected compo-

    nents in a graph which embodies the source and target feature point relationship. Similar

    to Parks [16] feature point grouping, we group source super nodes into concrete regions.

    The source region segmentation result is easily transferred to the target face via the super

    node correspondence. Now we are ready to classify every vertex to regions according to the

    vertex-region correlation. The last preprocessing step is to place each region in the param-

    eter space, which is a standard technique for blend shape-based facial expression cloning.

    Readers can refer Pyns paper [20] for technical details.

    The task in the synthesis part is to transfer expressions from the source face to the target

    face at runtime. There are three steps in this part, parameter extraction, key shape blending,

    and region composition. Parks work [16]has treated this part rather well. Hence, we use

    their techniques directly.

    The remainder of this thesis is organized as follows: In Chapter 2, we review related

    works. Chapter 3 describes our region segmentation method in detail. We show experi-

    mental results in Chapter 4. Finally, we conclude our work and suggest future research in

    Chapter 5.

    5

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    15/40

    2. Related Works

    Realistic facial animation remains a fundamental challenge in computer graphics. Begin-

    ning with Parkes pioneering work [17], extensive research has been dedicated to this field.

    Williams [23] first proposed a performance-driven facial animation. Noh and Neumann

    [14] addressed the problem of facial expression cloning to reuse facial animation data. In

    fact, performance-driven animation can be regarded as a type of expression cloning froman input image sequence to a 3D face model. We focus on recent results closely related to

    facial expression cloning besides those already mentioned in Chapter 1. An comprehensive

    overview can be found in the well known facial animation book by Parke and Waters [18].

    Blend shape scheme: Following Williams work, there have been many approaches

    in performance-driven animation [10, 19, 2, 6, 4, 11, 1, 5, 3]. For our purposes, the most

    notable are blend shape approaches [10, 19, 2, 6, 1, 5, 3], in which a set of example models

    are blended to obtain an output model. In general, the blending weights are computed by

    least squares fitting [10, 19, 2, 6, 1, 5, 3]. From the observation that the deformation space of

    a face model is well approximated by a low-dimensional linear space, a series of research

    results on facial expression cloning have been presented based on a blend shape scheme

    with scattered data interpolation [20, 13, 15, 16]. The favorable advantages are stated in

    Chapter 1.

    Region segmentation: While being robust and efficient, the main difficulty of blend

    shape approaches is an exponential growth rate of the number of key models with respect

    to the number of facial attributes. Kleiser [9] applied a blend shape scheme to manually-

    segmented regions and then combined the results to synthesis a facial animation. Joshi et

    al. [8] automatically segmented a single face model based on a deformation map. Inspired

    by these approaches, Park et al. [15] proposed a method for segmenting a face model into

    a predefined number of regions, provided with a set of feature points manually specified on

    each face feature. The idea was to classify vertices into regions, each containing a face fea-

    ture, according to the movement coherency of each vertex with respect to the feature points

    in each region. Park et al. [16] further explored this idea to automate the whole process of

    6

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    16/40

    feature-based expression cloning, which greatly reduces the animators burden. They ad-

    dressed three issues of automatic processing: the extraction, correspondence establishment,

    and grouping of the feature points on source and target face models.

    Multi-linear model: Vlasic et al. [22] proposed a method based on a multi-linear hu-

    man face model to map video-recorded performances of one individual to facial animations

    of another. This method is a generalization of blend shape approach and thus can be trivially

    adapted to facial expression cloning. As a general data-driven tool, a reasonable multi-linear

    model requires a large number of face models with different attributes. Moreover, the multi-

    linear model is not quite adequate to address specific issues in facial expression cloning such

    as asymmetric facial gestures and topological independence between source and target face

    models.

    Mesh deformation transfer: Sumner and Popovi [21] proposed to transfer the defor-

    mation of a triangular mesh to another triangular mesh. This method can also be applied to

    facial expression cloning. Unlike blend shape approaches [20, 13, 15, 16], the method does

    not require any key models besides a source and a target face mesh. Instead, the animator

    manually provides facial feature points and their correspondence between source and target

    models. Without using key face models, however, it is hard to incorporate the animators

    intention into the output animation. Another limitation is that the source and target models

    should share the same topology although their meshes may be different in both vertex count

    and connectivity.

    7

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    17/40

    3. Region Segmentation

    In this chapter, we explain in detail our new method to segment the source and target faces

    into corresponding regions which will be synthesized individually at runtime. We first estab-

    lish a many-to-many correspondence between source and target feature points by analyzing

    their movement coherence. Then we extract super nodes from the relationship graph of the

    source and target feature points. Next we segment the source face into regions by groupingsource super nodes. Since the source and target super nodes have one-to-one correspon-

    dence, the grouping result is transferred from the source face to the target face. Finally

    every vertex of the face mesh is classified to one or more regions according to its movement

    coherence with the regions. In general, a face model is symmetrical with respect to the ver-

    tical bisecting plane. Assuming that both halves of the face model have similar deformation

    capability, we only do analysis on a halve face and reflect the regions to the other half face.

    3.1 Many-to-many Feature Point Matching

    We extract feature points on the source and target faces by using the method in [16]. Figure

    3.1 shows the feature points extracted on the source and target faces. We want to find their

    correspondence so that corresponding feature points move similarly when the source and

    target faces show the same expression.

    From a single source feature points view, it does not necessarily move similarly as

    only one target feature point. Rather it moves similarly as a small group of target feature

    points. The situation is the same for a target feature point. So we set up this many-to-many

    correspondence in two steps. In the first step, we find corresponding target feature points

    for every source feature point. In the second step, we find corresponding source feature

    points for every target feature point. The two steps are symmetric. We mainly describe the

    first step as follows.

    We use the equation proposed in [16] to measure the movement coherency cjk for a

    source feature point vj and a target feature point vk.

    8

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    18/40

    Figure 3.1: Extracted feature points on the source and target faces.

    cjk =

    1

    N

    N1

    i=0

    sijk

    w1

    1

    N

    N1

    i=0

    ijk

    w2

    d0jkw3

    (3.1)

    where

    sijk =

    1 ifvij = v0j and v

    ik = v

    0k

    1abs( vijv0jvikv0k )

    max{ vijv0j, vikv0k }

    otherwise

    ijk =

    1 ifvij = v0j and v

    ik = v

    0k

    0 ifvij = v0j or v

    ik = v

    0k (but not both)

    max

    vijv

    0j

    vijv0j v

    ikv

    0k

    vikv0k

    , 0

    otherwise

    d0jk = max

    1

    v0k v0j

    D, 0

    .

    9

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    19/40

    Here, N is the number of source key models. vij denotes the vertex js 3D position in

    key model i. Key model 0 is the neutral face which is regarded as the base model. wl ,

    l=1,2,3 is the weight for each multiplicative term. The user can adjust these parameters.

    We empirically set w1 = 21, w2 = 2

    0, and w3 = 22. We define D = maxDS,DT. DS

    is the minimum Euclidean distance such that the source vertices form a single connected

    component when we connect every pair of vertices whose Euclidean distance is not greater

    than DS. DT can be obtained by binary search, starting from the maximum distance over all

    pairs of vertices.

    Intuitively, sijk measures the similarity of moving speeds of vertices, vij and v

    ik.

    ijk gives

    the similarity of moving directions. d0jk measures the geometrical proximity of the pair

    of vertices, v0j and v0k in the base face model (key model 0) that correspond to v

    ij and v

    ik,

    respectively. Note that every term takes on a value between zero and one, inclusively. Thus,

    the movement coherency cjk also takes on a value in the same range.

    For every source (target) feature point, a preference list of target (source) feature points

    is made by interpreting the movement coherency value cjk as the preference that source

    feature point j and target feature point k have for each other. Now the problem of finding

    corresponding target feature points for every source feature point can be reduced to the

    hospitals/residents problem with ties [7]. Here we consider every source feature point as

    a hospital and every target feature point as a resident. We set the number of available

    posts of every hospital to be 5% of the total number of residents. The algorithm [7] can

    determine whether a given instance of hospitals/residents problem with ties admits a super-

    stable matching, and construct such a matching if it exists. Let m and n be the number of

    hospitals and residents respectively. The algorithm is O(mn) time - linear in the size of the

    problem instance. If the algorithm reports there is not a super stable matching, we break

    the ties according the feature points geometrical distance di j and run the algorithm againto find a weak stable matching.

    In the second step, we treat target feature points as hospitals and source feature points as

    residents. After running the algorithm again, every target feature point gets its correspond-

    ing source feature points.

    We construct an undirected bipartite graph G = (VF,EF). Vertex set VF consists of

    10

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    20/40

    source feature points target feature points

    target

    super node

    source

    super node

    Figure 3.2: Feature point relationship graph. Each connected component in this graph has

    a pair of corresponding source and target super nodes.

    11

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    21/40

    source and target feature points. A source and a target feature points are connected by an

    edge if they are matched either in the first step or the second step. If a feature point is

    not matched to any other feature points, we do not include it in the graph. We mark it as

    unmatched and deal with it in a later processing stage. The bipartite graph in Figure 3.2

    embodies the many-to-many correspondence between source and target feature points.

    3.2 Super Node Extraction

    From the definition of movement coherency equation 3.1, we can say that if several source

    feature points move coherently with one target feature, then these source feature points must

    also have high movement coherencies. We find all connected components in the undirected

    bipartite graph G. This problem can be solved in O(|V|+ |E|) time using a standard graph

    algorithm [12]. Each connected component has a group of source feature points and a

    group of target feature points. We define the two groups as corresponding super nodes on

    the source and target face models, shown in 3.2. They have two properties: first, all the

    feature points in a super node move coherently; second, the feature points in a source super

    node move coherently with the feature points in the corresponding target super node. By

    extracting super nodes from the relationship graph, we convert the many-to-many feature

    point matching to the one-to-one super node matching. We compare our matching result

    with that in Parks previous work in Figure 3.3. In [16], many feature points are unmatched

    on the source face. Hence a large amount of information about the source mesh detail is

    lost.

    3.3 Source Super Node Grouping

    Similar to the idea of feature point grouping in [16], we group source super nodes into

    regions. The underlying idea is to partition the face mesh into meaningful regions such that

    each region contains a facial feature like an eye, the mouth, or a cheek. By using Equation

    ?? with slight modifications, we can compute the movement coherency cjk of two vertices

    vj and vk on the same face.

    Then we define the movement coherency between two source super nodes sp and sq as

    12

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    22/40

    (a)

    (b)

    Figure 3.3: Feature point matching results comparison. Black dots are unmatched feature

    points. (a) one-to-one matching result in [16], (b) our many-to-many matching result.

    13

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    23/40

    below:

    Cpq =1

    |Ip| |Iq|

    jIp

    kIq

    cjk (3.2)

    Here Ip and Iq are index sets of feature points in source super nodes sp and sq. Ip = {j |

    feature point vj belongs to super node sp}. Iq = {k | feature point vk belongs to super node

    sq}. |Ip| and |Iq| are the numbers of feature points in super nodes sp and sq.

    Our assumption is that if two super nodes have a high movement coherency, they belong

    to the same region. We construct an undirected graph GS = (VS,ES), where VS is the source

    super node set. A pair of super nodes are connected by an edge in ES if their movement

    coherency is greater than or equal to a given threshold . The problem of source super node

    grouping is reduced to finding connected components in graph GS. The user can change

    the value to control the number of connected components until she thinks the grouping

    result is reasonable. There might be some regions which have only one or two feature

    points. They are not sampled adequately and will cause artifacts in the output animation.

    We remove these outliers by merge their super nodes to other surviving regions.

    3.4 Region Transfer

    Given the one-to-one correspondence of source and target super nodes, transferring the su-

    per node grouping result from the source face to the target face is travail. Suppose source

    region SRi (i is the region index) has source super nodes sk, k IRi. IRi is the index set

    of super nodes that belong to SRi. Then its counterpart target region T Ri has target super

    nodes sj, j IRi. Here, we want to emphasize several points again. First, source and target

    faces have the same number of regions. Second, each source region has its corresponding

    target region. Third, a pair of corresponding source and target regions have the same num-

    ber of source and target super nodes respectively. Last, a pair of corresponding source and

    target regions do not have the same number of source and target feature points because cor-

    responding source and target super nodes dont necessarily have the same number of feature

    points. This is the key strength of our segmentation method presented in this thesis. Since

    each region is sampled adaptively by a varying number of feature points, the characteristic

    of the region is preserved as much as possible. Remember that a few feature points are

    14

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    24/40

    unmatched after running the hospitals/residents algorithm. Now we classify each of them

    into the region which has the largest coherency with it. Figure 3.4 compares our region seg-

    mentation result with that in the previous work [16]. In the previous work, corresponding

    regions are sampled with the same number of feature points.

    3.5 Vertex Classification

    Now we are ready to classify all vertices of the face mesh into (possibly overlapping) regions

    by exploiting the movement coherency of one vertex with respect to one region. Specifi-

    cally, we choose the vertex-region coherency cjFl as the maximum of coherencies between

    the vertex vj and the feature points vk contained in the region Fl , see Figure 3.5. That is,

    cjFl = maxkIFl

    {cjk}, (3.3)

    where IFl is the index set for the feature points in Fl .

    A vertex is classified into a region if their coherency is greater than or equal to a thresh-

    old value . Note that each vertex can be classified into two or more regions. It is necessary

    to have regions overlap on the boundary to get a seamless output animation on the targetface. The user can tone the value to control how much regions overlap. Figure 3.6 gives

    the vertex classification result.

    15

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    25/40

    (a)

    (b)

    Figure 3.4: Region segmentation result compared with [16]. (a) Parks result (b) Our result

    16

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    26/40

    vertexjv

    regionlF

    kvfeature point

    Figure 3.5: Vertex-region coherency is defined as the maximum of vertex-feature point

    coherencies.

    17

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    27/40

    (a)

    (b)

    Figure 3.6: (a) Partition of feature points (b) Vertex classification result

    18

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    28/40

    4. Experimental Results

    We carried out several sets of experiments to verify the new region segmentation method

    proposed in this thesis. The facial expression cloning system was implemented with C++

    and OpenGL. We performed our experiments on an Intel Pentium PC (P4 3.0GHz proces-

    sor, 2GB RAM, and NVIDIA GeForce FX 5950 Ultra). All the computation was done on

    CPU. Experimental details and data are shown and analyzed in this chapter.

    4.1 Key Model Specification

    To show the versatility of our region-based approach, we use various face models with dif-

    ferent geometric property, topology, and dimension. Figure 4.1 shows the four face models

    we used. The numbers of vertices and polygons in each face model, together with the num-

    ber of used key models are listed in Table 4.1. Note that the Cartoon face model is a 2D

    mesh while others are 3D meshes.

    As illustrated in Figure 4.2, we used fourteen key models for expression cloning: one

    face model with a neutral expression, six key models for emotional expressions, and seven

    key models for verbal expressions. Each source key model has a corresponding target key

    model. The neutral source and target face models are also called the source and target base

    (a) (b) (c) (d)

    Figure 4.1: Face models. (a) Man (b) Roney (c) Gorilla (d) Cartoon

    19

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    29/40

    Table 4.1: Key model specification

    appearance # vertices # polygons # key models

    Man Figure 4.1 (a) 1839 3534 14

    Roney Figure 4.1 (b) 5614 10728 14

    Gorilla Figure 4.1 (d) 4160 8266 7

    Cartoon Figure 4.1 (f) 902 1728 7

    neutral, joy, surprise, anger, sadness, disgust, and sleepiness.

    Figure 4.2: 14 key models.

    20

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    30/40

    face models respectively. The source face models share the same mesh configuration. So

    do the target face models. However, the source and target face models, in general, may have

    different mesh configuration and even different topology. For Gorilla and Cartoon, we use

    even less key models and still can get satisfactory results.

    4.2 Key Model Analysis

    In this section, we show our key model analysis results. We used the same value for

    feature point correspondence establishment, super node grouping and vertex classification.

    For movement coherency, we set w1 = 21, w2 = 2

    0, and w3 = 22 for Equation 3.1.

    Figure 4.3 comprises our feature point matching results with Part et als [16]. Our

    method makes use of larger percentages of original feature points. Region segmentation

    results on more face models are shown in Figure 1.1. By using different threshold, the user

    can control the number of regions. We get reasonable region segmentation by trial and error.

    4.3 Cloning Errors

    We perform two sets of experiments and compare the results from previous work [16] to

    verify our segmentation methods can achieve more accuracy. To measure the accuracy,

    self-cloning was done for the face model Man in two ways, direct self-cloning (from Man

    to Man) and indirect self-cloning (first from Man to X and then from X back to Man). The

    cloning error is measured as follows:

    =nj=1 xj x

    j

    nj=1 xj, (4.1)

    where xj and x

    j are the original and cloned 3D positions of a vertex vj of the face model

    Man, and n is the number of vertices in the model. For comparison with Parks work [16],

    we use the same sets of models. Roney, Gorilla, and Cartoon were used as the intermediate

    models. The input animation for Man consists of 1389 frames. The results were collected

    in Table 4.2 and Table 4.3.

    The next set of experiments was conducted to demonstrate the visual quality of cloned

    animations, where the transfer of asymmetric facial gestures was emphasized. The results

    21

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    31/40

    99.187%99.213%80.488%77.593%percentage

    1221269999matched

    123127123127original

    many-to-manyone-to-one# feature points

    Man to Roney

    77.297%88.976%42.1628%61.417%percentage

    1431137878matched

    185127185127original

    many-to-manyone-to-one# feature points

    Man to Gorilla

    95.238%73.228%74.603%37.008%percentage

    60934747matched

    6312763127original

    many-to-manyone-to-one# feature points

    Man to Cartoon

    Figure 4.3: Comparison of feature point matching results.

    22

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    32/40

    Man to Gorilla

    Man to Cartoon

    Figure 4.4: Region segmentation results on other face models.

    23

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    33/40

    Table 4.2: Self-cloning Errors for Man

    Types Intermediate face modelErrors (%)

    Park et al.s approach [16] Our method

    direct self-cloning 0.080 0.060

    indirect self-cloningRoney 0.223 0.195

    Gorilla 0.250 0.214

    Cartoon 0.167 0.150

    Table 4.3: Performance comparison with Park et al.s

    Types Approaches ErrorsTime

    Key model analysis Run-time transfer

    direct cloning Ours 0.060% 3.02 sec. 0.98 msec. (1020)

    (Man to Man) Park et al.s 0.080% 3.11 sec. 0.96 msec. (1041)

    indirect cloning Ours 0.195% 6.86 sec. 1.38 msec. (725)

    (Man to Roney to Man) Park et al.s 0.223% 7.06 sec. 1.36 msec. (735)

    ( ) : Average frames per second

    24

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    34/40

    Figure 4.5: Expression cloning results

    are given in the accompanying movie file. Figure 4.5 are snapshots from the animation.

    25

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    35/40

    5. Conclusion

    In this thesis, we present a new region segmentation method which can be integrated in

    Park et als region-based facial expression system [16]. We first establish a natural many-

    to-many feature point correspondence cross source and target face models. Then we extract

    coherently moving groups of feature points as super nodes to convert the many-to-many

    feature point correspondence to a one-to-one super node correspondence. We segment thesource face model by grouping strongly related source super nodes and reflect the result

    onto the target face. Our region segmentation methods adaptively samples facial regions

    with varying numbers of feature points. Hence the region segmentation result can preserve

    more mesh deformation information.

    One limitation of this work is that users have to adjust for different face models. In

    the future, we also want to incorporate physics to synthesis more realistic facial details.

    The facial expression cloning system only focuses on the geometry of face models. There

    are many other issues to consider about facial expression cloning, like head motions, gaze

    directions, different textures for each key model and so on.

    26

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    36/40

    %K

    **9M+i;a660m;?+860BM34

    0F)

    :r7HH\"fH%%il\OJ&4]j\0A@/@/6x:f&hwn\a~

    ZO ]j. :f&h s\ fe_ 8 < \OJ 4Sq_ @/ :f&h BgA . < :f&h[t s_ a>' (

    \ Kp. fe_ $s B Z}r (H Y> >h_ %%i

    o+e.\OJ%%ir+\"f {9@/{9(@/6x\

    \OJ5x.:f&hs&&h[tyy_%%i\r.l>r

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    37/40

    References

    [1] C. Bregler, L. Loeb, E. Chuang, and H. Deshpande. Turning to the masters: motion

    capturing animations. In Proc. of ACM SIGGRAPH, pages 399407, 2002.

    [2] I. Buck, A. Finkelstein, and C. Jacobs. Performance-driven hand-drawn animation. In

    Symposium on Non-Photorealistic Animation and Rendering, 2000.

    [3] Jin Chai, Jing Xiao, and Jessica Hodgins. Vision-based control of 3d facial animation.

    In ACM SIGGRAPH/Eurographics Symposium on Computer Animation, pages 193

    206, 2003.

    [4] Byoungwon Choe, Hanook Lee, and Hyeongseok Ko. Performance-driven muscle-

    based facial animation. Journal of Visualization and Computer Animation, 12(2):67

    79, 2001.

    [5] Erika Chuang and Chris Bregler. Performance driven facial animation using blend-

    shape interpolation. Stanford University Computer Science Technical Report,CS-TR-

    2002-02, 2002.

    [6] Douglas Fidaleo, Junyong Noh, Reyes Enciso Taeyong Kim, and Ulrich Neumann.

    Classification and volume morphing for performance-driven facial animation. In In-

    ternational Workshop on Digital and Computational Video, 2000.

    [7] Robert W. Irving, David Manlove, and Sandy Scott. The hospitals/residents problem

    with ties. In SWAT 00: Proceedings of the 7th Scandinavian Workshop on Algorithm

    Theory, pages 259271, London, UK, 2000. Springer-Verlag.

    [8] Pushkar Joshi, Wen C. Tien, Mathieu Desbrun, and F. Pighin. Learning controls for

    blend shape based realistic facial animation. In ACM SIGGRAPH/Eurographics Sym-

    posium on Computer Animation, pages 187192, 2003.

    [9] J. Kleiser. A fast, efficient, accurate way to represent the human face. ACM SIG-

    GRAPH 89 Course #22 Notes, 1989.

    28

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    38/40

    [10] Cyriaque Kouadio, Pierre Poulin, and Pierre Lachapelle. Real-time facial animation

    based upon a bank of 3d facial expressions. In Computer Animation, pages 128136,

    1998.

    [11] I-Chen Lin, Jeng-Sheng Yeh, and Ming Ouhyoung. Realistic 3d facial animation pa-

    rameters from mirror-reflected multi-view video. In IEEE Computer Animation, pages

    241250, 2001.

    [12] K. Mehlhorn. Data Structures and Algorithms, volume 1-3. Springer Publishing

    Company, 1984.

    [13] K. Na and M. Jung. Hierarchical retargetting of fine facial motions. Computer Graph-

    ics Forum, 23(3):687695, 2004.

    [14] J. Noh and U. Neumann. Expression cloning. In ACM SIGGRAPH, pages 277288,

    2001.

    [15] Bongcheol Park, Heejin Chung, Tomoyuki Nishita, and Sung Yong Shin. A feature-

    based approach to facial expression cloning. Computer Animation and Virtual Worlds,

    16(3-4):291303, 2005.

    [16] Bongcheol Park and Sung Yong Shin. A region-based facial expression cloning. Tech-

    nical Report CS/TR-2006-256, Korea Advanced Institute of Science and Technology,

    2006.

    [17] F. I. Parke. Computer generated animation of faces. In ACM National Conference,

    pages 451457, 1972.

    [18] Frederic I. Parke and Keith Waters. Computer facial animation. A. K. Peters, Ltd.,

    Natick, MA, USA, 1996.

    [19] F. Pighin, R. Szeliski, and D. H. Salesin. Resynthesizing facial animation through 3d

    model-based tracking. In IEEE International Conference on Computer Vision, pages

    143150, 1999.

    29

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    39/40

    [20] H. Pyun, Y. Kim, W. Chae, H. Y. Kang, and S. Y. Shin. An example-based approach

    for facial expression cloning. In ACM SIGGRAPH/Eurographics Symposium on Com-

    puter Animation, pages 167176, 2003.

    [21] Robert W. Sumner and Jovan Popovic. Deformation transfer for triangle meshes. In

    ACM SIGGRAPH, pages 399405, 2004.

    [22] Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and Jovan Popovic. Face transfer

    with multilinear models. In ACM SIGGRAPH, pages 426433, 2005.

    [23] L. Williams. Performance-driven facial animation. Computer Graphics(Proceedings

    of SIGGRAPH 90), 24:235242, 1990.

    30

  • 8/3/2019 Xing Qing- Many-to-many feature point correspondence establishment for region-based facial expression cloning

    40/40

    + ;

    I am grateful to numerous people who have helped in many ways throughout this research.

    I would like to thank my advisor, Professor Sung Yong Shin for his valuable support,

    feedback, and guidance throughout very stage of this research. I appreciate his encourage-

    ment for independence while simultaneously providing valuable guidance.

    I would also like to thank the masters thesis committee ( Dr. Shin, Dr. Cheong, and Dr.Cordier) for their valuable insights and helpful reviews.

    Special thanks go to my mentor, Bongcheol Park for his time, direct guidance, and

    consistent patience as I floundered my way through this process.

    I would like to express my gratitude to the Institute of Information Technology Assess-

    ment for generously sponsoring this research throughout Korean Government IT Scholar-

    ship Program.

    I am thankful to all members of TC Lab and all my friends for their help, friendship,

    and the comfortable working atmosphere.

    Last, but far from least, I must thank my parents for their unending support, uncondi-

    tional love, and forever blessing. Although I couldnt see them in the last two years, chatting

    with them over phone is the best way to cheer me up.