51
Non-rigid registration of 3D face scans Carlota Soler Arsasanz Bachelor Thesis Degree in Audiovisual Communications Engineering Escola Tècnica d'Enginyeria de Telecomunicació de Barcelona Universitat Politècnica de Catalunya And École Polytechnique Fédérale de Lausanne Advisor: Prof. Jean-Philippe Thiran Co-advisor: PhD Student Gabriel Cuendet & Dr. Hua Gao Co-advisor: Prof. Ferran Marques January 2015

Final Report Bachelor Thesis

Embed Size (px)

DESCRIPTION

Bachelor degree in Audiovisual Communication Systems

Citation preview

Page 1: Final Report Bachelor Thesis

Non-rigid registration of 3D face scans

Carlota Soler Arsasanz

Bachelor Thesis

Degree in Audiovisual Communications Engineering

Escola Tècnica d'Enginyeria de Telecomunicació de Barcelona

Universitat Politècnica de Catalunya

And

École Polytechnique Fédérale de Lausanne

Advisor: Prof. Jean-Philippe Thiran

Co-advisor: PhD Student Gabriel Cuendet & Dr. Hua Gao

Co-advisor: Prof. Ferran Marques

January 2015

Page 2: Final Report Bachelor Thesis

1

The context of this thesis lies on building a model capable of recognizing human facial

expressions and that is robust against identity. For recognizing these two aspects, the model has

to be trained in advance. For this, various 3D scans have been previously recorded: 120 subjects

making 36 facial expressions.

These scans (Fig.0.1) present a problem: they cannot be compared directly for many reasons.

They present different number of vertices and they are not aligned. Training the model requires

statistical analysis of the scans. A model studies the variability between different examples. It then

represents each facial expression by a preferably different set of parameters.

Altogether, this thesis presents a common representation of the scans. This allows to solve the

problem of the number of vertices and alignment such that statistical analysis can be performed.

We have implemented this solution following the methodology presented on FaceWarehouse [1].

We have focused only on the representation of the neutral face.

Fig.0.1. – Scans recorded with Kinect Camera

Page 3: Final Report Bachelor Thesis

2

El context d’aquesta tèsi és la creació d’un model capaç de reconèixer expressions facials i que a

part sigui robust a l’identitat. Per poder reconèixer aquests dos aspectes, el model ha de ser

entrenat prèviament. Per això, s’han grabat amb anterioritat diversos escàners en 3D: 120

subjectes fent 36 expressions facials.

Aquests escàners (Fig.0.2) presenten un principal problema: no poden ser comparats directament

ja que presenten un nombre diferent de vèrtexs i a més a més no estan alineats. Entrenar el

model implica fer un anàlisi estadístic dels escàners, i per tant estudiar la variabilitat entre

diferents exemples. Finalment, es descriu cada expressió facial per mitjà d’un conjunt de

paràmetres.

Aquesta tèsi presenta una representació comuna dels escàners. Això permet resoldre el problema

plantejat inicialment i permet, quan es crei el model, fer l’anàlisi estadístic dels escàners. La

solució presentada segueix la metodología descrita a FaceWarehouse [1]. En aquest projecte

només s’ha tingut en compte la representació de l’expressió facial neutre.

Fig.0.2. – Scans recorded with Kinect Camera

Page 4: Final Report Bachelor Thesis

3

El contexto de esta tesis es la creación de un modelo capaz de reconocer expresiones faciales y

que además sea robusto en cuanto a identidad. Para poder reconocer estos dos aspectos

previamente el modelo tiene que ser entrenado. Para ello se ha grabado previamente varios

escáneres en 3D: 120 sujetos haciendo 36 expresiones faciales.

Estos escáneres (Fig.0.3) presentan un principal problema: no pueden ser comparados

directamente ya que presentan un número diferente de vértices y a parte no están alineados.

Entrenar un modelo implica hacer un análisis estadístico de dichos escáneres en el que se

estudia la variabilidad entre diferentes ejemplos. Al final cada expresión facial se describe

mediante un conjunto de parámetros.

Esta tesis presenta entonces una representación común de los escáneres. Esto permite resolver

el problema planteado inicialmente y por tanto más adelante se podrá efectuar el análisis

estadístico de los escáneres. La solución presentada sigue la metodología descrita en

FaceWarehouse [1]. En este proyecto solamente se ha tenido en cuenta la expresión facial neutra.

Fig.0.3. – Scans recorded with Kinect Camera

Page 5: Final Report Bachelor Thesis

4

Acknowledgements

Primero de todo agradecer a mi familia por hacer posible una experiencia como esta.

Recordaré esta etapa con mucho cariño. Gracias por el apoyo.

Gracias por invertir en mí.

Gracias Nil por el apoyo en todo momento, en cualquier ámbito.

Siempre consigues que todo sea más fácil.

Thanks specially to my supervisor Gabriel Cuendet for guiding me on this project.

Specially for all the support at any time, the advice and the knowledge.

Thanks to Prof. Jean- Philippe Thiran and Dr. Hua Gao for receiving me at LTS5 and

letting me learn and participate in this project.

També agraïr en Ferran Marquès pel seu suport i per recomanar-me l’EPFL com a destí.

Ha estat una molt bona experiència.

Thanks Erasmus people for bringing Lausanne closer to home.

Thanks to all the LTS5 team for receiving me during these months.

Page 6: Final Report Bachelor Thesis

5

Revision history and approval record

Revision Date Purpose

0 30/12/2014 Document creation

1 28/01/2015 Document revision

DOCUMENT DISTRIBUTION LIST

Name e-mail

Carlota Soler Arasanz [email protected]

Ferran Marquès Acosta [email protected]

Jean-Philippe Thiran [email protected]

Gabriel Cuendet [email protected]

Written by: Reviewed and approved by:

Date 01/01/2015 Date 28/01/2015

Name Carlota Soler Arasanz Name Ferran Marquès Acosta

Position Project Author Position Project Supervisor

Page 7: Final Report Bachelor Thesis

6

Table of contents

Acknowledgements .......................................................................................................... 4

Revision history and approval record ................................................................................ 5

Table of contents .............................................................................................................. 6

List of Figures ................................................................................................................... 8

List of Tables .................................................................................................................. 11

1. Introduction .............................................................................................................. 12

1.1. Statement of purpose ....................................................................................... 12

1.2. Methods and procedures .................................................................................. 13

2. State of the art of the technology used or applied in this thesis ................................ 15

2.1. Face model databases ..................................................................................... 15

2.2. Face manipulation applications. ........................................................................ 17

3. Methodologies and project development ................................................................. 18

3.1. Mesh Deformation ............................................................................................ 18

3.1.1. Landmark correspondence ........................................................................ 19

3.1.2. First deformation ....................................................................................... 21

3.1.2.1. Morphable Model..................................................................................... 21

3.1.2.2. Constraints for deformation ..................................................................... 22

3.1.2.3. Alignment and correspondence model-scan ............................................ 23

3.1.2.4. Optimization problem: Least Squares ...................................................... 25

3.1.2.5. Energy balance ....................................................................................... 28

3.1.3. Second deformation .................................................................................. 29

4. Results .................................................................................................................... 31

4.1. Landmark Correspondence .............................................................................. 31

4.2. First deformation .............................................................................................. 33

4.2.1. Alignment Model-scan ............................................................................... 33

4.2.2. Correspondence model-scan ..................................................................... 35

4.2.3. Energy balance ......................................................................................... 35

4.2.3.1. Magnitude Balance .................................................................................. 36

4.2.3.2. Weight balance ....................................................................................... 39

4.2.4. Best reconstruction at the moment ............................................................ 41

Page 8: Final Report Bachelor Thesis

7

4.2.5. Problems on reconstructing some subjects ............................................... 44

5. Conclusions and future development ....................................................................... 45

Bibliography: ................................................................................................................... 47

Appendix ........................................................................................................................ 49

A.1 Data Capture: What is Kinect Fusion? [2] [3] ........................................................ 49

Page 9: Final Report Bachelor Thesis

8

List of Figures

Fig.0.1. – Scans recorded with Kinect Camera .............................................................. 1

Fig.0.2. – Scans recorded with Kinect Camera .............................................................. 2

Fig.0.3. – Scans recorded with Kinect Camera .............................................................. 3

1. Introduction .............................................................................................................. 12

Fig. 1.1. - Diagram of Bilinear Model ........................................................................... 12

Fig. 1.2. - Correspondence between scans ................................................................ 12

Fig. 1.3. - FaceWarehouse paper’s overview .............................................................. 13

Fig. 1.4. - Stages of FaceWarehouse in detail ............................................................. 14

2. State of the art of the technology used or applied in this thesis ................................ 15

Fig. 2.1. – Image extracted from [10]. Example of their database ................................ 15

Fig. 2.2. –Extracted from [5]. Scheme showing their model ......................................... 16

Fig. 2.3. –Extracted from [11]. Scheme showing their model ....................................... 16

Fig. 2.4. –Extracted from [13]. Example of their implementation ................................. 16

Fig. 2.5. –Extracted from [14]. Result after applying their method ............................... 17

Fig. 2.6. –Extracted from [15]. Result after applying their method ............................... 17

Fig. 2.7. –Extracted from [16]. Example of application of their method ........................ 17

3. Methodologies and project development ................................................................. 18

Fig. 3.1. – Mesh Deformation stage from FaceWarehouse .......................................... 18

Fig. 3.2. – Detailed steps for the Mesh Deformation stage from FaceWarehouse ....... 18

Fig.3.3. – Projection of 2D landmarks to 3D world ....................................................... 19

Fig.3.4. – Coordinates of 2D image and landmarks ..................................................... 20

Fig.3.5. – Change from 3D world coordinates to camera coordinates.......................... 20

Fig.3.6. – Different results of the Morphable Model by varying the PCA’s coefficients . 21

Fig.3.7. – Example of k-nearest neighbours search..................................................... 24

Fig.3.7. – Shows the dimensions of PCA’s variables .................................................. 27

Fig.3.8. – A vertex xi and its adjacent faces (a), and one term of its curvature normal

formula (b) .................................................................................................................. 30

Page 10: Final Report Bachelor Thesis

9

4. Results .................................................................................................................... 31

Fig.4.1. – Results after running the 2D landmark tracking software over some images

.................................................................................................................................... 31

Fig.4.2. – Results of 2D landmarks projected on the 3D scan: 3D landmarks.............. 32

Fig.4.3. – Differentiation of facial landmarks. (Blue for external and black for internal) 32

Fig.4.4. –Alignment between the scan (orange) and the mean shape of the morphable

model (blue) ................................................................................................................ 33

Fig.4.5. – Projected landmarks on image plane. Demonstrates the alignment between

meshes ....................................................................................................................... 34

Fig.4.6. – Example of finding the correspondent vertices to the model on the scan. .... 35

Fig.4.7.b – View with respect to ωfea of Fig.4.7.a ........................................................ 36

Fig.4.7.a – Epos according to ωpos and ωfea .............................................................. 36

Fig.4.8.b –View of Fig.4.8.a with respect to ωfea ......................................................... 37

Fig.4.8.a – Efea according to ωpos and ωfea ............................................................... 37

Fig.4.8.c – View of Fig.4.8.a from the top .................................................................... 37

Fig.4.9 – Surfaces representing Epos and Efea ............................................................ 38

Fig.4.10 – Different reconstructions for determining the magnitude of the weights ...... 38

Fig.4.11.a – Plot with Epos, Efea and Ecoef for deciding the final value of the weights . 39

Fig.4.12 – Different reconstructions for determining more accurately the weights ....... 40

Fig.4.11.b – View of Fig.4.11.a with respect to ωpos(left) and with respect to ωfea

(right) .......................................................................................................................... 40

Fig.4.13.d – Scan (grey surface) and reconstruction (blue) overlapped ....................... 41

Fig.4.13.b – Scan (orange) and reconstruction (blue) overlapped ............................... 41

Fig.4.13.a –Resulting reconstruction ........................................................................... 41

Fig.4.13.c – Scan (grey surface) and reconstruction (blue) overlapped ....................... 41

Fig.4.14.a – Resulting reconstruction with 2D landmarks ............................................ 42

Fig.4.14.b – Scan (orange) and reconstruction (light blue) overlapped ........................ 42

Fig.4.14.c – Scan (grey surface) and reconstruction (light blue vertices) overlapped .. 42

Fig.4.15.b – 3D landmark and scan reconstruction. Distances to the scan according to

4.15.c .......................................................................................................................... 43

Fig.4.15.a – 2D landmark reconstruction Distances to the scan according to 4.15.c ... 43

Fig.4.15.c – Color scale in meters ............................................................................... 43

Fig.4.16 – Result of correspondent vertex to the model on the scan. Left: 2D landmark

reconstruction and right: 3D landmarks and scan’s shape. .......................................... 43

Fig.4.17.a – Projection of landmarks to image plane. Pink (scan), black (model), blue

(2D landmarks ............................................................................................................. 44

Page 11: Final Report Bachelor Thesis

10

Fig.4.17.b – Same projection as Fig.4.17.a but without external landmarks (chin) ...... 44

Fig.4.18 – 2D landmark reconstruction Distances to the scan according to 4.15.c ...... 44

5. Conclusions and future development ....................................................................... 45

Appendix ........................................................................................................................ 49

Fig. A.1.1. – Scanning of a scene with Kinect .............................................................. 49

Fig. A.1.2. – Kinect’s Fusion pipeline .......................................................................... 49

Fig. A.1.3. – Kinect’s Fusion diagram .......................................................................... 50

Page 12: Final Report Bachelor Thesis

11

List of Tables

1. Introduction .............................................................................................................. 12

2. State of the art of the technology used or applied in this thesis ................................ 15

3. Methodologies and project development ................................................................. 18

Tab.3.1. – Table that clarifies the objective of each constraint .................................... 26

Tab.3.2. – Summarizes the value of each Cost function and its derivative .................. 27

4. Results .................................................................................................................... 31

5. Conclusions and future development ....................................................................... 45

Appendix ........................................................................................................................ 49

Page 13: Final Report Bachelor Thesis

12

1. Introduction

1.1. Statement of purpose

Context

The context of this project is the creation of a 3D model for recognizing human facial

expressions. 3D data gives more information about faces when there is head pose.

What does this model include?

The model is linear with respect to those two

aspects, as depicted on Fig.1.1:

· The expression made

· Who makes the expression

The model has to be trained for being capable of recognizing such aspects. Many

individuals have been recorded making the considered expressions. These 3D scans

have been recorded with Microsoft Kinect’s RGBD camera.

Problem statement

The recorded scans cannot be used directly to train the

model. Training requires statistical analysis of the scan

corresponding to each expression. A direct comparison

such as depicted on Fig.1.2 cannot be performed

between scans for the following reasons:

· They have different number of vertices

· They are not aligned

· We do not know where the facial landmarks

are located to compare how they vary

Fig. 1.1. - Diagram of Bilinear Model

Fig. 1.2. - Correspondence

between scans

Page 14: Final Report Bachelor Thesis

13

Objectives

The purpose of this project is to solve the differences that the recorded data presents. To

find a common representation is essential to build the mentioned model.

The common representation found on this project is going to ensure that:

· The scans have the same number of vertices

· The facial landmarks are located on the scan

· The scans are aligned between them

· The scans are represented without holes or missing data

These properties guarantee that the scans can be compared. Therefore the solution

found on this project makes the data useful for building the model at a later time.

1.2. Methods and procedures

The whole project is based on a paper published on March 2014 titled FaceWarehouse: a

3D facial expression database for visual computing [1].

On this paper they specify techniques that solve the problems and objectives of the thesis.

They follow the general pipeline described on Fig.1.3.

The project focuses on the central stage. Mesh Deformation generates a common

representation for the 3D scans, which is the objective of this project.

Fig. 1.3. - FaceWarehouse paper’s overview

Page 15: Final Report Bachelor Thesis

14

Fig.1.4 gives some more detail of the paper’s pipeline. It provides a context to the reader

of the previous and next steps of our implementation.

The Mesh Deformation stage takes as input the 3D scans. These are obtained with the

Microsoft’s Kinect Fusion algorithm [2] by fusing the RGBD information (Appendix A.1).

After our implementation, we get a common representation for the scans. Note that in this

stage we only consider the neutral face. The rest of facial expressions are purposed to be

implemented for future work.

Fig. 1.4. - Stages of FaceWarehouse in detail

Page 16: Final Report Bachelor Thesis

15

2. State of the art of the technology used or applied in this

thesis

Apart from the FaceWarehouse paper, there is more related work on 2D and 3D face

model databases. Some of them are focused on neutral expression models, while others

contain multiple expressions for dynamical applications. There are also applications

involving face and head modelling, including face transfer, reanimation, performance

tracking in images and video.

2.1. Face model databases

Face databases are of great value in face-related research areas in both modelling and

validation of methods. For this reason, many researchers built their own face database

for specific applications.

Some of these face databases are composed only of 2D face images, while others

contain 3D shape information. Some of them contain only one static expression (the

neutral face), which is mostly used in applications involving only static faces, e.g., face

recognition. Other face databases also contain other expressions, which can be used in

face motion-based applications, e.g., face expression recognition and tracking.

In computer vision, 2D face databases have been widely used as training and/or

validation data in face detection and recognition and facial expression analyses. Yin et al..

[10] mentioned that although some systems have been successful, performance

degradation remains when handling expressions with large head pose rotation, occlusion

and lighting variations.

To address the issue, they created a 3D facial expression

database (Fig.2.1). They used a 3D face imaging system to

acquire the 3D face surface geometry and surface texture of

each subject with seven emotion-related universal expressions.

Such a database with few expressions for each person works

fine for their face expression recognition and analysis, but may

fall short of the diversity required in some applications of visual

computing, such as those shown in the FaceWarehouse [1].

Fig. 2.1. – Image extracted from [10]. Example of their database

Page 17: Final Report Bachelor Thesis

16

Blanz and Vetter [5] method models

the variations in facial attributes using

dense surface geometry and color

data. They built the morphable face

model to describe the procedure of

constructing a surface that represents

the 3D face from a single image.

Furthermore, they supplied a set of

controls for intuitive manipulation of

appearance attributes (thin/fat,

feminine/masculine) (see Fig.2.2).

Vlasic et al. [11] proposed multilinear

models in facial expression by tracking and

transferring. They estimate a model from a

data set of three-dimensional face scans

that varies in expression, viseme, and

identity (Fig.2.3). This multilinear model

decouples the variation into several modes

(e.g., identity, expression, viseme) and

encodes them consistently.

All of these databases contain face data with different identities, and some even with

different expressions. However, they may still be inadequate for facial expression

parameterization or rig. Due to their excellent performance, facial rigs based on

blendshape models are particularly popular in expressing facial behaviors.

Ekman’s Facial Action Coding System [12] helps decompose facial behavior into 46 basic

action units.

Li et al. [13] developed an

example-based facial rig, which

allows transferring expressions

from a generic prior to create a

blendshape model of a virtual

character. This blendshape model

can be successively fine-tuned

toward the specific geometry and

motion characteristics of the

character by providing more

training data in the form of

additional expression poses.

Fig. 2.3. –Extracted from [11].

Scheme showing their model

Fig. 2.2. –Extracted from [5]. Scheme

showing their model

Fig. 2.4. –Extracted from [13].

Example of their implementation

Page 18: Final Report Bachelor Thesis

17

2.2. Face manipulation applications.

Face manipulation is a convenient tool for artists and animators to create new facial

images or animations from existing materials, and hence of great research interest in

computer animation.

In the 2D domain, Leyvand et al. [14] enhance the attractiveness of human faces in

frontal photographs (Fig.2.5). Given a face image as the target, Bitouk et al. [15] find a

most similar face in a large 2D face database and use it to replace the face in the target

image to conceal the identity of the face (Fig.2.6).

Recently, 3D face models have become increasingly popular

in complex face manipulation. Blanz et al. [16] reanimate the

face in a still image or video by transferring mouth

movements and expression based on their 3D morphable

model [5] and a common expression representation (Fig.2.7).

They also exchange the face between images across large

differences in viewpoint and illumination [17].

Given a photo of person A, Shlizerman et al. [18] seek a

photo of person B with similar pose and expression from a

large database of images on the Internet.

Yang et al. [19] derive an expression flow and alignment flow

from the 3D morphable model between source and target

photos, capable of transferring face components between the

images naturally and seamlessly.

Shilizerman et al. [20] generate face animations from large

image collections.

Dale et al. [21] replace the face in a portrait video sequence through a 3D multilinear

model [11].

Fig. 2.5. –Extracted

from [14]. Result after

applying their method

Fig. 2.6. –Extracted from [15]. Result after applying their method

Fig. 2.7. –Extracted from

[16]. Example of

application of their

method

Page 19: Final Report Bachelor Thesis

18

3. Methodologies and project development

On this project we follow the same methodology as FaceWarehouse [1] (look at section

1.2. for a generic overview of this paper).

We do not implement all the paper, but

only the Mesh Deformation stage, as

depicted on Fig.3.1..

The techniques purposed by them

solve the objective of the thesis: find a

common representation of the scans.

This solution enables to compare the

scans between them, so at a later time

they can be used to build a model.

Note that in this stage we only consider the neutral face. The rest of facial expressions

are not contemplated on this thesis.

3.1. Mesh Deformation

The pipeline described on FaceWarehouse for this stage follows the steps on Fig.3.2.

Fig. 3.1. – Mesh Deformation stage from

FaceWarehouse

Fig. 3.2. – Detailed steps for the Mesh Deformation stage from FaceWarehouse

Page 20: Final Report Bachelor Thesis

19

The pipeline in Fig.3.2 is applied only to the neutral facial expression.

First of all, we need the landmark correspondence between the 2D image and the 3D

scans. Then, a generic model is deformed according to some constraints using the

landmarks’ positions.

The goal of the deformation is that the model looks alike the 3D scan captured with the

Kinect camera. The reason of this deformation is to have a mesh with the properties of

the model but with the shape of the scan. The structure of the model guarantees that all

the scans have a common representation. Having all the scans with the same properties

is the main goal of the thesis as stated on section 1.1.

The deformation of the model is followed by a second deformation. This one consists on

moving the vertices of the previous reconstruction. With this, the final reconstruction looks

more similar to the scan according to some constraints.

3.1.1. Landmark correspondence

The position of the facial landmarks is essential to guide the deformation. They have to

remain near when deforming the model. Also, having the landmarks located on the final

reconstruction will enable to compare between the different 3D meshes and to study their

variability (when building the model at a later time).

For finding the 2D landmarks we used a previously implemented algorithm developed

here at this research laboratory. It should be run on the frontal face to give good results.

As the Kinect Fusion manages images from multiple viewpoints, a frontal frame has to be

selected for finding the landmarks properly.

Having the coordinates of the 2D landmarks, the corresponding 3D landmarks can be

found on the scan. The procedure consists on projecting the 2D coordinates into the 3D

world. This is an affine projection and uses the calibration parameters of the Kinect

camera (𝑓𝑥, 𝑓𝑦 are focal lengths).

𝑥 =𝑢

𝑓𝑥 ; 𝑦 =

𝑣

𝑓𝑦

Regarding the z coordinate: the mesh’

nearest vertex to the projected beam is

searched.

On Fig.3.3 the procedure is depicted.

The orange beam corresponds to the

projection of a 2D coordinate using the

camera’s calibration parameters into the

3D world. Fig.3.3. – Projection of 2D landmarks to 3D

world

Page 21: Final Report Bachelor Thesis

20

The exposed method can be applied only if the image and the scan are well aligned with

the camera coordinates. As they are not aligned from initialisation, the camera’s

calibration parameters are used for the alignment.

a) RGB Image alignment

The facial landmarks obtained from the landmark

detection software are given in pixels and

consider the origin on the upper-left corner.

Contrarily, the camera coordinates consider the

origin on the center of the image (see Fig.3.4).

(𝐶𝑥, 𝐶𝑦) are the camera’s calibration parameters

for computing this equivalence with:

𝑢 = 𝑝𝑖𝑥𝑒𝑙 𝑐𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒 𝑥 − 𝐶𝑥

𝑣 = 𝑝𝑖𝑥𝑒𝑙 𝑐𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒 𝑦 − 𝐶𝑦

b) Scan alignment

Fig.3.5 summarizes the matrices used to

align the 3D world (where the scan is) to

the camera’s coordinates. We define:

𝑇𝑐𝑎𝑚 = [𝑅𝑐𝑎𝑚 𝑡𝑐𝑎𝑚

0 0 0 1]

where 𝑅𝑐𝑎𝑚 ≈ 𝐼3 and 𝑡𝑐𝑎𝑚 ≈ (𝑣𝑠𝑧/2𝑣𝑠𝑧/2−𝑣𝑠𝑧

).

𝑅𝑐𝑎𝑚 corresponds to the camera rotation

and 𝑡𝑐𝑎𝑚 is always defined in this way.

𝑣𝑠𝑧 is the size of the volume (scan).

For going from the IR camera to the RGB

camera coordinates, we define:

𝑡𝑖𝑟2𝑟𝑔𝑏 = [𝑡𝑥, 0,0]𝑇

where 𝑡𝑥 is the distance between the

cameras (x axis) and there is no rotation.

Fig.3.4. – Coordinates of 2D image

and landmarks

Fig.3.5. – Change from 3D world

coordinates to camera coordinates

𝑣𝑠𝑧

Page 22: Final Report Bachelor Thesis

21

3.1.2. First deformation

The objective of this deformation is to have a mesh with some properties and to deform it

in the sense that it fits the scan. The final result has a surface close to the scan but still

conserves the properties of the initial mesh. This initial mesh corresponds to the common

representation that we are looking for: the morphable model.

The deformation is guided by some constraints. In our case the first constraint is that the

reconstruction should fit the surface of the scan. The second constraint is that the 3D

landmarks’ location on the scan have to be respected. The last constraint is a

regularization term. This prevents deforming a lot the model, so at the end it still should

have the shape of a face with a smooth surface.

3.1.2.1. Morphable Model

The morphable model used for implementing this task is the one defined by Blanz and

Vetter [5]. It performs Principal Component Analysis (PCA) on 200 neutral face models.

With this model any neutral face can be approximated as a linear combination of the

average face, �̅�, and 𝑙 leading PCA vectors.

𝑉 = �̅� + ∑ 𝛼𝑖𝐹𝑖

𝑙

𝑖=1

(4)

On our implementation, we compute the PCA coefficients, 𝛼𝑖, that better respects the

constraints guiding the deformation.

The Morphable Model we use is the one implemented by The University of Basel in 2009.

Its name is Basel Face Model [4] and the implementation is in Matlab.

Fig.3.6. – Different results of the Morphable Model by varying the PCA’s coefficients

�̅� is the average face

𝐹𝑖 is the i-th PCA vector

Page 23: Final Report Bachelor Thesis

22

Fig.3.6. shows some results of the Basel Face Model by changing the PCA coefficients.

The Basel Face Model includes the coordinates of the facial landmarks. This is very

important because one of our constraints for guiding the deformation requires the

landmark’s coordinates.

3.1.2.2. Constraints for deformation

According to the methodology on FaceWarehouse [1], the PCA coefficients are found

according to some constraints, which intend basically to minimize this energy term:

𝐸1 is the combination of other energy terms. Each term regulates an aspect that should

be optimized on the deformation process. Also, the weights will regulate the importance

that is given to each of them. The weights are discussed on sections 3.1.2.5 and 4.2.3.

The terms on (1) correspond to:

Fitting with the landmarks

𝐸𝑓𝑒𝑎 = ∑ ‖𝑣𝑖𝑗− 𝑐𝑗‖

2𝑚𝑙

𝑗=1

This term sums the difference between vertices on the PCA reconstruction mesh

and vertices on the scan. This term only takes into account the vertices that refer

to landmarks. So, the energy represents the error between landmarks on the scan

and on the reconstruction.

Similarity to the scan’s surface

𝐸𝑝𝑜𝑠 = ∑ ‖𝑣𝑑𝑗− 𝑝𝑗‖

2𝑛𝑑

𝑗=1

This term also sums the difference between vertices on the PCA reconstruction

mesh and vertices on the scan. For this term all the vertices are taken into

account. Note that the scan has much more vertices than the model. For this step

𝐸1 = 𝜔1𝐸𝑓𝑒𝑎 + 𝜔2𝐸𝑝𝑜𝑠 + 𝜔3𝐸𝑐𝑜𝑒𝑓 (1)

𝑝 is the scan surface

𝑣 is the PCA reconstructed mesh

(only the 𝑛𝑑 corresponding points)

𝑐 is the scan

𝑣 is the PCA reconstructed mesh

𝑚𝑙 is the number of landmarks

Page 24: Final Report Bachelor Thesis

23

we look for the corresponding points to the model on the scan (see section

3.1.2.3). The criteria is based on looking for the closest vertex on the scan

(following the FaceWarehouse [1] methodology).

Summing the error between the two surfaces is the same as fitting the surface of

the model to the scan’s surface.

Regularization term

𝐸𝑐𝑜𝑒𝑓 = 1

2𝛼𝑇Λ𝛼

This energy term will regulate that the PCA coefficients are not very high. The

higher the coefficients, the more different the reconstruction is from the mean

shape. At the end, the reconstruction has to be a face with a smooth surface. An

extreme deformation could cause that the result does not look like a face anymore.

3.1.2.3. Alignment and correspondence model-scan

The difference between vertices from model and scan cannot be computed directly. First

of all we have to find the corresponding points to the model on the scan. The

correspondence is essential for finding 𝐸𝑝𝑜𝑠 on the previous section.

Scaling and alignment between scan and model

- The scale is completely different

- If they are not aligned it does not make sense to compute the difference

The scan has much more vertices than the model

- Find -for each vertex on the model- the corresponding point on the scan

- Criteria: find -for each model’s vertex- the nearest vertex on the scan

a) Alignment

Having the 3D landmarks on both figures, we align them by making the vertices coincide.

The method is based on [6]. It obtains the optimal rigid transformation between the

landmarks in the least squares sense.

We seek a Rotation, translation and scale such that:

(𝑠, 𝑅, 𝑡) = argmin𝑠,𝑅,𝑡

∑ 𝑤𝑖‖(𝑠𝐼𝑅𝒑𝒊 + 𝒕) − 𝒒𝒊‖2

𝑛

𝑖=1

where 𝑤𝑖>0 are the weights for each pair

𝑃 = {𝑝1, 𝑝2, … , 𝑝𝑛} 𝑎𝑛𝑑 𝑄 = {𝑞1, 𝑞2, … , 𝑞𝑛} are the corresponding landmarks of each figure.

𝛼 are the PCA coefficients

Λ = 𝑑𝑖𝑎𝑔 (1

𝜎12 ,

1

𝜎22 , …

1

𝜎𝐿2)

Page 25: Final Report Bachelor Thesis

24

Fig.3.7. – Example of k-nearest neighbours search

After some formulation on [6], the alignment is summarized into:

(1). Compute the weighted centroids of both point sets:

�̅� =∑ 𝑤𝑖𝑝𝑖

𝑛𝑖=1

∑ 𝑤𝑖𝑛𝑖=1

, �̅� =∑ 𝑤𝑖𝑞𝑖

𝑛𝑖=1

∑ 𝑤𝑖𝑛𝑖=1

(2). Compute the centred vectors:

𝑥𝑖 = 𝑝𝑖 − �̅� , 𝑦𝑖 = 𝑞𝑖 − �̅�, 𝑖 = 1,2, … , 𝑛

(3). Compute the 𝑑𝑥𝑑 covariance matrix: 𝑆 = 𝑋𝑊𝑌𝑇 where 𝑋 and 𝑌 are the 𝑑𝑥𝑛 matrices

that have 𝑥𝑖 and 𝑦𝑖 as their columns, respectively, and 𝑊 = 𝑑𝑖𝑎𝑔(𝑤1, 𝑤2, … 𝑤𝑛)

(4). Compute the singular value decomposition 𝑆 = 𝑈Σ𝑉𝑇. The rotation and scale we are

looking for are then:

𝑅 = 𝑉 (

1⋱

1det (𝑉𝑈𝑇)

) 𝑈𝑇 , 𝑠 =∑ 𝑦𝑖

𝑇𝑅𝑥𝑖𝑛𝑖=1

∑ 𝑥𝑖𝑇𝑥𝑖

𝑛𝑖=1

(5). Compute the optimal translation as:

𝒕 = �̅� − 𝑅�̅�

b) Correspondence

The scan has around 65*104 vertices whereas the model has around 53*103.

For finding the nearest point on the scan for each vertex of the model a possible solution

would be to compute all the distances between all the vertices and then choose the lower

one. This method is simple but very exhaustive, so another technique has been proposed.

The vertices of the scan can be stored in a KD Tree search object. Search objects

organize the data on a convenient way, so then finding an interesting point is easier.

The tool chosen for implementing this

search object is the Matlab function KD

Tree. Also, it includes a method for

computing the nearest neighbour to an

input point on the previously created

KD Tree (knnsearch). As seen on

Fig.3.7, the data is organized on nodes

based on coordinates.

Page 26: Final Report Bachelor Thesis

25

For finding the k-nearest neighbours to a given query point, the method knnsearch

determines at first the node to which the query point belongs. Then, it finds the closest k

points within that node and its distance to the query point. Then, it chooses all other

nodes having any area that is within the same distance (in any direction, in any distance

computed before). In the example (Fig.3.7) the closest point has a red square.

3.1.2.4. Optimization problem: Least Squares

On the FaceWarehouse paper [1] the energy term 𝐸1 is minimized with the Least

Squares optimization method. As the equations correspond to a linear model, the optimal

solution can be found directly.

Here is a generic solution to an optimization problem using the Least Squares method:

Having a determined system:

𝑦𝑖 = ∑ 𝑥𝑖𝑗𝛽𝑗

𝑛

𝑗=1

𝑖 = 1,2, … 𝑚

of m linear equations and with n unknown coefficients 𝛽1, 𝛽2, … 𝛽𝑛 𝑤𝑖𝑡ℎ 𝑚 > 𝑛

From which the matrix form 𝒚 = 𝑿𝜷 is:

𝑿 = [

𝑥11 ⋯ 𝑥1𝑛

⋮ ⋱ ⋮𝑥𝑚1 ⋯ 𝑥𝑚𝑛

] 𝜷 = [𝛽1

⋮𝛽𝑛

] 𝒀 = [

𝑦1

⋮𝑦𝑚

]

The objective is to find the coefficients 𝜷, which fit the equations best in the sense of

solving the quadratic minimization problem:

�̂� = argmin𝜷

𝐽(𝜷) , where 𝐽 is the cost function given by:

𝐽(𝜷) = ∑ | ∑ 𝑥𝑖𝑗𝛽𝑗 − 𝑦𝑖𝑛𝑗=1 |

2𝑚𝑖=1 = ‖𝑿𝜷 − 𝒀‖2 (2)

For minimizing the cost function J the partial derivative is set to zero: 𝜕𝐽

𝜕𝛽= 0, and then:

𝜕𝐽

𝜕𝛽= 2 (𝑿𝜷 − 𝒀)𝑿 ; 𝑿𝑻𝑿𝜷 − 𝑿𝑻𝒀 = 𝟎 (3)

�̂� = (𝑿𝑻𝑿)−𝟏𝑿𝑻𝒀 (�̂�: coefficients that minimize the error)

Page 27: Final Report Bachelor Thesis

26

We have used the same procedure as on (3) to find the optimal PCA coefficients

according to the energies constraints on (1), section 3.1.2.2.

The total cost function in our problem corresponds to the sum of each cost function from

each energy term (1).

Similarly to (3), the derivative of the total cost function is set to 0. What is the same, the

sum of all the derivatives of each cost function with respect to α is set to zero.

𝜕𝐸1

𝜕𝛼= 𝜔1

𝜕𝐸𝑓𝑒𝑎

𝜕𝛼+ 𝜔2

𝜕𝐸𝑝𝑜𝑠

𝜕𝛼+ 𝜔3

𝜕𝐸𝑐𝑜𝑒𝑓

𝜕𝛼

𝜕𝐸1

𝜕𝛼= 0 , 𝛼 𝑏𝑒𝑖𝑛𝑔 𝑡ℎ𝑒 𝑃𝐶𝐴 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠

In order to implement each partial derivative of each cost function, something should be

clarified: what does exactly each cost function represent?

For 𝐸𝑓𝑒𝑎 and 𝐸𝑝𝑜𝑠, the energy term has the form ‖𝑿𝜷 − 𝒀‖2, what is the same as:

𝑐𝑜𝑠𝑡 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 (𝑱) = 𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 (𝑿𝜷) − 𝑡ℎ𝑒 𝑖𝑑𝑒𝑎𝑙 𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛(𝒚)

As the constraints 𝐸𝑓𝑒𝑎 and 𝐸𝑝𝑜𝑠 deform the model in different ways, each cost function is

different. Those aspects are summarized in Tab.3.1.

Reconstruction Objective of the reconstruction

𝑬𝒇𝒆𝒂

𝑉 = �̅� + ∑ 𝛼𝑖𝐹𝑖

𝑙

𝑖=1

Landmarks’ vertices on model should match the landmarks on the scan

𝑬𝒑𝒐𝒔 All the vertices of the model should fit

the whole shape of the scan

Observing Tab.3.1, the constraint related with 𝐸𝑓𝑒𝑎 tries to deform the model in the sense

that its landmarks’ vertices fit with the landmarks on the scan. In the case of 𝐸𝑝𝑜𝑠 what is

fitted is that all the vertices from the model match to the scan’s surface.

Different number of vertices are taken into account for computing each cost function. For

𝐸𝑓𝑒𝑎 the number of vertices are the landmarks and for 𝐸𝑝𝑜𝑠 the number of vertices are all

the vertices of the model.

Tab.3.1. – Table that clarifies the objective of each constraint

Page 28: Final Report Bachelor Thesis

27

Fig.3.7. – Shows the dimensions of PCA’s variables

Mean shape

𝑛

= V + PC

A c

oeff.

Eigenvectors

𝑛 𝑛

The number of vertices considered in each cost function also affects to the dimension of

the eigenvectors. As depicted on Fig.3.7, the same number of eigenvectors are

considered (rows: the same as PCA coefficients) but the dimension regards to the

number of vertices (columns: n). The variables on Fig.3.7 correspond to the ones on

equation (4), the equation of the morphable model.

Another concept to be taken into account when optimizing 𝐸𝑓𝑒𝑎 and 𝐸𝑝𝑜𝑠 is to remove the

mean shape from the objective vertices. On the PCA reconstruction 𝑉 = �̅� + ∑ 𝛼𝑖𝐹𝑖𝑙𝑖=1

the mean shape and the eigenvectors are summed. Therefore, the eigenvectors only

measure the deviation of the reconstruction from the mean shape. We are looking for the

value of α, which only affect to the eigenvectors and do not consider the mean shape.

𝐽 = 𝑉 − 𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑣𝑒𝑟𝑡𝑖𝑐𝑒𝑠

𝐽 = (𝑉 − 𝑚𝑒𝑎𝑛 𝑠ℎ𝑎𝑝𝑒) − (𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑣𝑒𝑟𝑡𝑖𝑐𝑒𝑠 − 𝑚𝑒𝑎𝑛 𝑠ℎ𝑎𝑝𝑒)

𝐽 = (𝑚𝑒𝑎𝑛 𝑠ℎ𝑎𝑝𝑒 + 𝜶𝑭 − 𝑚𝑒𝑎𝑛 𝑠ℎ𝑎𝑝𝑒) − (𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑣𝑒𝑟𝑡𝑖𝑐𝑒𝑠 − 𝑚𝑒𝑎𝑛 𝑠ℎ𝑎𝑝𝑒)

𝐽 = 𝜶𝑭 − 𝑦 , where y is (𝑜𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑣𝑒𝑟𝑡𝑖𝑐𝑒𝑠 − 𝑚𝑒𝑎𝑛 𝑠ℎ𝑎𝑝𝑒)

Regarding the constraint 𝐸𝑐𝑜𝑒𝑓 on (1) the previous indications are not valid. It is a

regularization term and its value only depends on the coefficients 𝛼.

Tab.3.2. summarizes for each constraint on (1) its cost function 𝐽(𝛼) and its derivative 𝜕𝐽

𝜕𝛼.

Cost function 𝑱(𝜶) Partial derivative 𝝏𝑱(𝜶)

𝝏𝜶

𝑬𝒇𝒆𝒂 𝐽𝑓𝑒𝑎(𝜶) = 𝑭𝒇𝒆𝒂𝜶 − 𝒚𝒇𝒆𝒂 𝑭𝒇𝒆𝒂𝑻𝑭𝒇𝒆𝒂𝜶 − 𝑭𝒇𝒆𝒂

𝑻𝒚𝒇𝒆𝒂

𝑬𝒑𝒐𝒔 𝐽𝑝𝑜𝑠(𝜶) = 𝑭𝒑𝒐𝒔𝜶 − 𝒚𝒑𝒐𝒔 𝑭𝒑𝒐𝒔𝑻𝑭𝒑𝒐𝒔𝜶 − 𝑭𝒑𝒐𝒔

𝑻𝒚𝒑𝒐𝒔

𝑬𝒄𝒐𝒆𝒇 𝐽𝑐𝑜𝑒𝑓(𝜶) = 12⁄ 𝜶𝑇𝚲𝜶 𝜶𝚲

Note that the Eigenvectors for each energy term have different dimensions: 𝑛 (Fig.3.7).

𝑭 PCA eigenvectors

𝜶 PCA coefficients

𝒚𝒇𝒆𝒂 3D landmarks on

scan – mean shape

𝒚𝒑𝒐𝒔 3D vertices on

scan – mean shape

Tab.3.2. – Summarizes the value of each Cost function and its derivative

Page 29: Final Report Bachelor Thesis

28

Combining all the partial derivatives leads to find the optimal coefficients of the PCA:

0 = 𝜔1

𝜕𝐸𝑓𝑒𝑎

𝜕𝛼+ 𝜔2

𝜕𝐸𝑝𝑜𝑠

𝜕𝛼+ 𝜔3

𝜕𝐸𝑐𝑜𝑒𝑓

𝜕𝛼

0 = 𝜔1 (𝑭𝒇𝒆𝒂𝑻𝑭𝒇𝒆𝒂𝜶 − 𝑭𝒇𝒆𝒂

𝑻𝒚𝒇𝒆𝒂) + 𝜔2 (𝑭𝒑𝒐𝒔𝑻𝑭𝒑𝒐𝒔𝜶 − 𝑭𝒑𝒐𝒔

𝑻𝒚𝒑𝒐𝒔) + 𝜔3(𝜶𝚲)

�̂� = 𝜔1𝑭𝒇𝒆𝒂

𝑻𝒚𝒇𝒆𝒂 + 𝜔2𝑭𝒑𝒐𝒔𝑻𝒚𝒑𝒐𝒔

𝜔1𝑭𝒇𝒆𝒂𝑻𝑭𝒇𝒆𝒂 + 𝜔2𝑭𝒑𝒐𝒔

𝑻𝑭𝒑𝒐𝒔 + 𝜔3𝜶𝚲

The value of the weights 𝜔1, 𝜔2 and 𝜔3 are discussed on section 3.1.2.5.

After finding the optimal coefficients, the final reconstruction can be computed:

𝐵𝑒𝑠𝑡 𝑟𝑒𝑐𝑜𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 = �̅� + ∑ 𝛼�̂�𝐹𝑖

𝑙

𝑖=1

3.1.2.5. Energy balance

Optimizing 𝐸1 from (1) requires summing different energy terms. Different number of

vertices are summed when computing each cost function. This fact provokes that the

terms have different magnitudes.

The easy solution would be to normalize each energy term by the number of vertices

considered. For instance, dividing 𝐸𝑓𝑒𝑎 by the number of landmarks and 𝐸𝑝𝑜𝑠 by the

number of vertices of the model. After implementing this approach, we observe that the

magnitudes of 𝐸𝑓𝑒𝑎 and 𝐸𝑝𝑜𝑠 are the same. But, after computing the coefficients and

computing 𝐸𝑐𝑜𝑒𝑓, its magnitude becomes much smaller than the magnitude of the other

terms. This fact provokes that we cannot sum directly all the terms on equation (1).

Because of the difference of magnitude we cannot use the weights specified on

FaceWarehouse [1].

Section 4.2.3.1 discusses the magnitude of each weight. This will regulate the difference

of each term and make it possible to sum all the terms in equation (1).

Section 4.2.3.2 goes beyond the magnitude and tries to find a more accurate weight for

each term.

Page 30: Final Report Bachelor Thesis

29

3.1.3. Second deformation

After obtaining the first deformation of the mesh (section 3.1.2) a refinement is carried out.

This should improve the quality of the mesh and approximate it more to some of the

previous constraints: the similarity with the surface of the scan and the feature fitting.

This deformation tries to minimize another energy, which is:

𝐸2 = 𝜔1′𝐸𝑓𝑒𝑎 + 𝜔2

′𝐸𝑝𝑜𝑠 + 𝜔3′𝐸𝑙𝑎𝑝 (7)

The weights will be revised when implementing the optimization problem.

A huge difference between the optimization problems is the parameters to be optimized.

In (1), the goal was to obtain the best PCA’s coefficients. This made the problem linear,

as the parameters were obtained by a linear function. In (7) the mesh’s vertices are the

parameters to be optimized. As the position of the vertices can be moved freely, this is

not a linear optimization problem anymore. These kind of problems are solved by iterative

methods, considering the best solution when there is convergence.

Another huge difference between (1) and (7) is the regularization term. In (1) the

regularization term was to regulate the parameters of the PCA (those being optimized).

The regularization term on (7) is defined on (8). The reason for using a Laplacian

constraint is to ensure that the vertices on the neighbourhood remain close. This fact

guarantees a smooth surface on the reconstruction similar to the original.

The regularization term on (7) is a Laplacian Energy:

𝐸𝑙𝑎𝑝 = ∑ ‖𝐿𝑣𝑖 −𝛿𝑖

|𝐿𝑣𝑖|𝐿𝑣𝑖‖

2𝑛

𝑖=1

(8)

𝐸𝑙𝑎𝑝 regulates that the Laplacian of the reconstruction ( 𝑣 ) is close to the original

Laplacian (before deformation). The Laplacian operator computed on a mesh gives the

Laplacian coordinates: 𝐿(𝑋) = 𝛿.

𝐿: discrete Laplacian operator (eq. 9)

𝛿: magnitude of the original Laplacian coordinate before deformation

𝑛: vertex number of the mesh

Page 31: Final Report Bachelor Thesis

30

Computing the Laplacian operator on a mesh

According to [9], the Laplacian of the surface at a vertex has both a normal and tangential components. Curvature flow technique smoothes the surface by moving along the

surface normal n with a speed equal to the mean curvature �̅�: 𝜕𝑥𝑖

𝜕𝑡= −𝑘�̅�𝒏𝒊. As this

technique uses only intrinsic properties and does not parameterize the curve, it does not add noise to the smoothing process.

After some formulation (Appendix of [9]), the approximation of the curvature is given by:

−�̅�𝒏 =1

4𝐴∑ (cot 𝛼𝑗 + cot 𝛽𝑗)(𝑥𝑗 − 𝑥𝑖)

𝑗∈𝑁1(𝑖)

(9)

where 𝛼𝑗 and 𝛽𝑗 are the two angles opposite to the edge in the two triangles having the

edge 𝑒𝑖𝑗 in common (Fig.3.8 (b)). N1(i) are the 1-ring neighbours of the vertex xi. A is the

sum of the areas of the triangles having xi as a common vertex: A = ∑ Ann .

After obtaining the Laplacian coordinates, the problem is solved using a nonlinear Gauss-Newton optimizer. On FaceWarehouse [1] they mention that the approach for solving the optimization problem is on [8].

𝑃𝑄𝑛 = 𝑄𝑛 − 𝑃: is the

vector from point P (xi) to

a neighbour point Qn (xj)

𝐴𝑛 =1

2‖𝑃𝑄𝑛 × 𝑃𝑄𝑛+1‖

Fig.3.8. – A vertex xi and its adjacent faces (a), and one term of

its curvature normal formula (b)

Page 32: Final Report Bachelor Thesis

31

4. Results

4.1. Landmark Correspondence

For the 2D landmark correspondence the software was already implemented. It just had

to compile on the computer it was used. Some results are shown in Fig.4.1.

Fig.4.1. – Results after running the 2D

landmark tracking software over some images

The image in Fig.4.1 with the red

landmarks is a clear example of bad

tracking.

In the majority of cases the result was

satisfactory.

Page 33: Final Report Bachelor Thesis

32

Regarding the 3D landmarks, Fig.4.2 shows results after implementing the method

described in section 3.1.1.

As observed on Fig.4.2 in almost all the

cases the internal points (black on

Fig.4.3) are well projected into the scan.

Otherwise, many problems appear when

projecting the external landmarks, which

correspond to the chin (blue on Fig.4.3).

The problems found when projecting the

external landmarks to the scan might be

for the hair. Note that without long hair

the result improves.

Fig.4.2. – Results of 2D landmarks projected on the 3D scan: 3D landmarks

Fig.4.3. – Differentiation of facial landmarks.

(Blue for external and black for internal)

Page 34: Final Report Bachelor Thesis

33

4.2. First deformation

4.2.1. Alignment Model-scan

Fig.4.4 depicts the result after aligning the meshes with the method described in

3.1.2.3.a. The orange mesh is the scan and the blue is the mean shape of the model.

First of all the scan has been aligned to the image plane following the method on 3.1.1.b.

After, the mean shape of the model has been aligned to the scan following 3.1.2.3.a. The

purpose of this alignment is that both meshes are aligned with the camera coordinates.

This fact enables to project the landmarks from the 3D meshes to the image plane with

the camera’s calibration parameters: fx, fy, Cx, Cy. This is useful to check the alignment

between meshes (see Fig.4.5).

The error of the alignment shown on Fig.4.4 has been computed (coordinates in meters):

𝐸𝑎𝑙𝑖𝑔𝑛𝑚𝑒𝑛𝑡 = ∑ ‖𝑎𝑙𝑖𝑔𝑛𝑒𝑑𝑀𝑜𝑑𝑒𝑙3𝐷𝑐𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒𝑠 − 𝑠𝑐𝑎𝑛3𝐷𝑐𝑜𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒𝑠‖2

# 𝑙𝑎𝑛𝑑𝑚𝑎𝑟𝑘𝑠

𝑖=1

= 0.0062 (5)

Fig.4.4. –Alignment between the scan (orange) and the mean shape of the morphable model

(blue)

Page 35: Final Report Bachelor Thesis

34

On Fig.4.5. there is the projection of the landmarks of the model (black) and the scan

(pink) on the image plane. The blue dots correspond to the 2D landmarks on the image

plane. The method for obtaining this projection is the inverse procedure of 3.1.1.

The fact that the landmarks of the model (black) are slightly displaced from the ones from

the scan (pink) justifies the alignment error of 0.0062 (in mm) found on Eq.(5).

Fig.4.5. – Projected landmarks on image plane. Demonstrates the alignment between meshes

Page 36: Final Report Bachelor Thesis

35

4.2.2. Correspondence model-scan

Section 3.1.2.3.b describes the methodology for finding the corresponding points to the

model on the scan. It consists on looking for the closes vertex on the scan for each vertex

of the model. The scan is stored as a KD Tree search object to improve the speed and

efficiency of the search (more detail on section 3.1.2.3.b).

Fig.4.6 depicts a result of the search. The orange figure is the corresponding points on

the scan. Notice that there are some areas where the nearest vertices found by the

algorithm do not correspond to the real corresponding vertex of the scan.

4.2.3. Energy balance

As described on section 3.1.2.5, the magnitudes of each energy term on equation (1)

need to be similar. It does not make sense that one of the terms has a very low

magnitude. This provokes that this term is not even considered.

Section 4.2.3.1 discusses the magnitude of each weight. This will regulate the difference

of each term and make it possible to sum all the terms in equation (1).

Section 4.2.3.2 goes beyond the magnitude and tries to find a more accurate weight for

each term.

Fig.4.6. – Example of finding the correspondent vertices to the model on the scan.

Page 37: Final Report Bachelor Thesis

36

Fig.4.7.b – View with respect to ωfea of Fig.4.7.a Fig.4.7.a – Epos according to ωpos and ωfea

4.2.3.1. Magnitude Balance

This section discusses the magnitude of each weight. For this, we plot the final value of

the energies 𝐸𝑝𝑜𝑠 and 𝐸𝑓𝑒𝑎 while changing the weight’s values. Remember that:

· 𝐸𝑝𝑜𝑠: energy error for the surface-fitting between model and scan

· 𝐸𝑓𝑒𝑎: energy error for the landmark-fitting between model and scan

Observing the results, if the value of 𝐸𝑝𝑜𝑠 or 𝐸𝑓𝑒𝑎 is high it means that the constraint is not

respected at all. The objective of this approach is to find the weights that better respect

both constraints (𝐸𝑝𝑜𝑠 and 𝐸𝑓𝑒𝑎).

The next plots have the properties:

· Weight of 𝐸𝑐𝑜𝑒𝑓 = 1

· X axis: 𝜔𝑝𝑜𝑠: weights for 𝐸𝑝𝑜𝑠 – range between 106 and 10-6

· Y axis: 𝜔𝑓𝑒𝑎 weights for 𝐸𝑓𝑒𝑎 – range between 106 and 10-6

· X and Y axis: logarithmic scale; z axis: linear scale

· Energy’s value (z axis) is not scaled with the weight (as the magnitudes are very

different, it would represent just the weight instead of the energy’s value)

Fig.4.7 (a) and (b) depict the value of 𝐸𝑝𝑜𝑠 with respect to 𝜔𝑝𝑜𝑠 and 𝜔𝑓𝑒𝑎. When the value

of Epos is higher, the constraint is not respected at all. Higher value means that the error

regarding to this constraint is higher.

Observing to Fig.4.7.b we conclude that the magnitude of 𝜔𝑓𝑒𝑎 has to be at least lower

than 104, so the constraint 𝐸𝑝𝑜𝑠 is respected.

Page 38: Final Report Bachelor Thesis

37

Fig.4.8.b –View of Fig.4.8.a with respect to ωfea Fig.4.8.a – Efea according to ωpos and ωfea

Fig.4.8.c – View of Fig.4.8.a from the top

Fig.4.8 (a), (b) and (c) depict the value of 𝐸𝑓𝑒𝑎 with respect to 𝜔𝑝𝑜𝑠 and 𝜔𝑓𝑒𝑎.

Observing Fig.4.8 (a), (b) and (c) we

conclude that the constraint 𝐸𝑓𝑒𝑎 is better

respected when:

· 𝜔𝑝𝑜𝑠 is between 10-5 and 102

· 𝜔𝑓𝑒𝑎 is between 102 and 105

Which are the non dark-red areas on the

plots.

After observing the figures 4.7 and 4.8, we conclude:

· We do not have to optimize the energies of the weights (optimization problem on

(1) is with respect the PCA’s coefficients, not the weights of the energies).

· A trade-off between the magnitudes ensures that both constraints are respected.

Page 39: Final Report Bachelor Thesis

38

Fig.4.9 – Surfaces representing Epos

and 𝐸𝑓𝑒𝑎

Fig.4.9 shows:

· 𝜔𝑝𝑜𝑠 is between 10-4 and 102 (values discussed with Fig.4.7)

· 𝜔𝑓𝑒𝑎 is between 102 and 104 (values discussed with Fig.4.8)

· All axis on logarithmic scale: 𝐸𝑝𝑜𝑠 and 𝐸𝑓𝑒𝑎 are very different in magnitude

· Top surface: 𝐸𝑝𝑜𝑠, surface below: 𝐸𝑓𝑒𝑎

We conclude that the best values for

each weight are:

· 𝜔𝑝𝑜𝑠 ∈ [0.1, 1]

· 𝜔𝑓𝑒𝑎 ∈ [10, 1000]

These values respect the trade-off

between 𝐸𝑝𝑜𝑠 and 𝐸𝑓𝑒𝑎.

Fig.4.10 depicts the different

reconstructions using the

weights listed above.

The best trade-off for the

weights is demonstrated with

the reconstructions.

We conclude:

· 𝜔𝑝𝑜𝑠 ∈ [0.1, 1]

· 𝜔𝑓𝑒𝑎 ≈ 100

Fig.4.10 – Different

reconstructions for determining

the magnitude of the weights

Page 40: Final Report Bachelor Thesis

39

Observing Fig.4.10 we conclude that using 𝜔𝑓𝑒𝑎 very high overfits the reconstruction to

the Landmark constraint. Using very low 𝜔𝑓𝑒𝑎 does not consider the landmark constraint

at all (see the eyes or mouth). Regarding 𝜔𝑝𝑜𝑠 , the difference between the selected

weights is not so high. We conclude that the value has to be between the selected ones.

4.2.3.2. Weight balance

In this section we try to find a more accurate weight for each term, considering the

magnitudes found on section 4.2.3.2.

Fig.4.11 shows:

· Intersection of the energies considered on (eq. 1): 𝐸𝑝𝑜𝑠, 𝐸𝑓𝑒𝑎 and 𝐸𝑐𝑜𝑒𝑓

· 𝜔𝑝𝑜𝑠 ∈ [0.1, 1]

· 𝜔𝑓𝑒𝑎 ∈ [50, 400]

· All the axis are in linear scale

· For z axis: energies are multiplied by the weight

The circled area (Fig.4.11.a) is where both energies are lower. We conclude:

· 𝜔𝑝𝑜𝑠 ∈ [0.3, 0.4]

· 𝜔𝑓𝑒𝑎 ∈ [90, 200]

Fig.4.11.a – Plot with Epos, 𝐸𝑓𝑒𝑎 and 𝐸coef for deciding the final value of the weights

Page 41: Final Report Bachelor Thesis

40

Fig.4.12 – Different

reconstructions for

determining more

accurately the weights

Fig.4.12 depicts the different

reconstructions using the

weights listed above.

Observing Fig.4.12 we note that 𝜔𝑝𝑜𝑠 does not make a big difference between those

values. Regarding to 𝜔𝑓𝑒𝑎, the reconstructions for 200 are over fitted to the landmark

constraint. Contrarily, the results for 90 and 100 are very close.

At the end we conclude that the best values for the weights are:

· 𝜔𝑝𝑜𝑠 = 0.3

· 𝜔𝑓𝑒𝑎 = 100

· 𝜔𝑐𝑜𝑒𝑓 = 1 (was set at beginning of section 4.2.3)

Fig.4.11.b depicts side views of Fig.4.11.a. The concluded weight is also included.

Fig.4.11.b – View of Fig.4.11.a with respect to ωpos(left) and with respect to ωfea (right)

(6)

9)

Page 42: Final Report Bachelor Thesis

41

4.2.4. Best reconstruction at the moment

The objective of the thesis is to find a common way to represent the scans. The solution

found after section 3.1.2 guarantees that the scans will have the same vertices and that

they will be aligned.

Fig.4.13.a is the result of the reconstruction with the parameters in (6). Fig.4.13.b, c, d

depict both meshes overlapped (scan and reconstruction with parameters on (6)).

Fig.4.13.d – Scan (grey surface) and

reconstruction (blue) overlapped

Fig.4.13.b – Scan (orange) and

reconstruction (blue) overlapped Fig.4.13.a –Resulting reconstruction

Fig.4.13.c – Scan (grey surface) and

reconstruction (blue) overlapped

Page 43: Final Report Bachelor Thesis

42

Fig.4.14.a – Resulting

reconstruction with 2D landmarks

Fig.4.14.b – Scan (orange) and

reconstruction (light blue) overlapped

We compute on this section the reconstruction only with the 2D landmarks. The approach

consists on fitting the morphable model only with the 2D landmarks on the image. The

approach is the same as described on section 3.1.2.4 but 𝐸𝑓𝑒𝑎 corresponds to the 2D

landmarks instead of the 3D landmarks. This approach was implemented before this

thesis.

Fig.4.14.a

corresponds to the

deformed

morphable model

with the 2D

landmarks as

deformation

constraint.

Fig.4.14.b depicts

the result of

overlapping the

scan (orange) and

the result of the

reconstruction

(light blue).

Fig.4.14.c shows both meshes overlapped from the

frontal view. The scan is the one with the grey surface

and the result of the reconstruction is the one with light

blue vertices.

The reconstruction according to the 2D landmarks

implies a better frontal reconstruction (Fig.4.14.c) than

the one from the side (Fig.4.14.b). 2D landmarks are

tracked on the frontal face (Fig.4.1).

Fig.4.14.c – Scan (grey surface) and reconstruction

(light blue vertices) overlapped

Page 44: Final Report Bachelor Thesis

43

Fig.4.16 – Result of correspondent vertex to the model on the scan. Left: 2D

landmark reconstruction and right: 3D landmarks and scan’s shape.

Fig.4.15.a and Fig.4.15.b compare how much does the reconstruction improve for each

case. Only using the 2D landmarks (a) and using the scan and the 3D landmarks (b).

Fig.4.15 (a) and (b) depict visually the distance between the vertices of

the reconstruction to each corresponding point on the scan. The colors

follow the scale on Fig.4.15.c, which is distance in meters. The 2D

information reconstruction has an averaged error per vertex of 0.045 mm.

In the case of the 3D information reconstruction, the error is 0.028 mm.

As described on section 4.2.2 the

search of the nearest point can bring to

error in some areas. Fig.4.16 shows the

result of finding the correspondent

vertices in this case.

Although this correspondence search can bring some error, Fig.4.15 (a) and (b) are a

good measure to see which areas are better reconstructed according to the scan’s

surface.

Mean error per vertex

0.045 mm 0.028 mm

0

0.0012

0.0048

0.0096

0.0193

Fig.4.15.c –

Color scale

in meters

Fig.4.15.a – 2D landmark reconstruction

Distances to the scan according to 4.15.c

Fig.4.15.b – 3D landmark and scan reconstruction.

Distances to the scan according to 4.15.c

Page 45: Final Report Bachelor Thesis

44

Fig.4.18 – 2D landmark

reconstruction Distances to the

scan according to 4.15.c

Fig.4.17.a – Projection of landmarks to image plane.

Pink (scan), black (model), blue (2D landmarks

Fig.4.17.b – Same projection as

Fig.4.17.a but without external

landmarks (chin)

4.2.5. Problems on reconstructing some subjects

After proving that the common representation fits the scan very accurately, we have used

it to represent some other scans. In some of the cases there were many problems with

the alignment between the model and scan (is the step described on section 3.1.2.3.a).

This problem appears when the 3D landmarks are not well projected on the scan. As

discussed on section 4.1 in Fig.4.2 (central subject): the landmarks of the chin are

projected to the hair.

For this reason, the alignment is computed with error (see Fig.4.17.a). Discarding the

chin landmarks the alignment seems to be better (Fig.4.17.b). But when observing the

result of the alignment (Fig.4.18) we realize that it is not correct. What has happened in

this case is that two 3D meshes have tried to be aligned by a 2D plane (the one

corresponding to the inner landmarks).

Solving this problem is purposed

for future work on section 5.

Page 46: Final Report Bachelor Thesis

45

5. Conclusions and future development

The solution found in this project implies that all the recorded scans are useful. Now the

data can be used to build a model for recognizing facial expressions as well as identity.

Our approach consists on using the morphable model as a common representation for

each scan. As described on section 3.1.2, we deform the morphable model in the sense

that its surface fits the surface of the scan.

Representing the scans with this common representation guarantees:

· They have the same number of vertices

· They are aligned between them

· The facial landmarks are localized

· There are no holes or missing data

These properties demonstrate that a set of points have been converted into a structured

mesh. In this work we obtain a resulting mesh that fits the scan but at the same time

conserves the properties listed above. These properties come from the morphable model.

In this work we have compared also the reconstruction with 2D and 3D information

(Fig.4.15). For these cases, the constraints guiding the deformation were different. For

the 2D information, the deformation had to fit the 2D landmarks. We obtained better

results when using both the 3D landmarks and the surface of the scan. With this

approach the final reconstruction fits better to the scan, which is supposed to be more

reliable than the 2D landmarks reference.

The common representation has been applied to different scans. One of the common

problems comes when projecting the landmarks of the chin to the 3D scan. The

projection is inaccurate because the landmarks can be projected, for instance, to the hair

(Fig 4.2). This really is a problem for aligning the model to the scan. It has been

contemplated to discard the chin landmarks but again the alignment was unsuccessful

(Fig. 4.18). For future work then, we purpose to solve this problem.

Another task purposed for future work is to implement the second deformation (see

section 3.1.3) as they describe on the FaceWarehouse paper [1]. This deformation uses

Page 47: Final Report Bachelor Thesis

46

a Laplacian Operator as regularization term. To solve the nonlinear optimization problem

they use an inexact Gauss-Newton method.

As introduced in section 1.2 (see Fig.1.4), the project has been focused on implementing

the Mesh Deformation stage. Therefore the immediate next step of this project is to

obtain the rest of facial expressions with a common representation. This cannot be

implemented as with the neutral face because the morphable model can only represent

neutral faces (see section 3.1.2.1). Deformation transfer [7] techniques need to be

implemented for generating the Facial Blendshapes.

The context of this project is to build a model capable of recognizing facial expressions

on people and to be robust against identity. Thanks to his thesis the recorded scans can

be represented uniformly for being compared thereafter. Building this model is then the

last step of the future work that follows this project.

Page 48: Final Report Bachelor Thesis

47

Bibliography:

[1] C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou. FaceWarehouse: a 3D facial

expression database for visual computing. March 2014, Zhejiang University.

[2] Richard A. Newcombe, Shahram Izadi, Otmar Hilliges, David Molyneaux, David Kim,

Andrew J. Davison, Pushmeet Kohli, Jamie Shotton, Steve Hodges, and Andrew

Fitzgibbon. KinectFusion: Real-Time Dense Surface Mapping and Tracking.

Published in IEEE ISMAR. October 2011.

[3] Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe,

Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison,

and Andrew Fitzgibbon. KinectFusion: Real-time 3D Reconstruction and Interaction

Using a Moving Depth Camera. October 2011.

[4] P. Paysan, R. Knothe, B. Amberg, S. Romdhani, T. Vetter. A 3D Model for Pose and

Illumination Invariant Face Recognition. Published in Advanced Video and Signal

Based Surveillance, 2009. AVSS '09. Sixth IEEE International Conference.

September 2009

[5] V. Blanz and T. Vetter. A Morphable Model for the synthesis of 3D faces. In

SIGGRAPH ’99.

[6] Olga Sorkine. Least-Squares Rigid Motion Using SVD. Technical notes, February

2009.

[7] R. W. Summer and J. Popović. Deformation Transfer for Triangle Meshes. Appears

is SIGGRAPH 2004.

[8] J. Huang, X. Shi, X. Liu, K. Zhou, L. Wei, S. Teng, H. Bao, B, Guo, H. Shum.

Subspace Gradient Domain Mesh Deformation. Microsoft Research Asia, Zheijang

University. Jul. 2006.

[9] M. Desbrun, M. Meyer, P. Schröder, A. H. Barr. Implicit Fairing of Irregular Meshes

using Diffusion and Curvature Flow. Proceedings of ACM SIGGRAPH, 199, pp. 317-

324.

[10] L. Yin, X. Wei, Y. Sun, J. Wang, and M. J. Rosato, A 3d facial expression database

for facial behavior research, in Proc. IEEE Intl, 2006

[11] D. Vlasic, M.Brand, H. Pfister, and J. Popović., Face transfer with multilinear models,

Jul. 2005

[12] P. Ekman and W. Friesen, Facial Action Coding System: A Technique for the

Measurement of Facial Movement. Consulting Psychologists, Press, 1978.

[13] H. Li, T. Weise, and M. Pauly, Example-based facial rigging, ACM Trans. Graph., vol.

29, no. 4, pp. 32:1–32:6, Jul. 2010.

[14] T. Leyvand, D. Cohen-Or, G. Dror, and D. Lischinski, Data-driven enhancement of

facial attractiveness, ACM Trans. Graph., vol. 27, no. 3, Aug. 2008.

Page 49: Final Report Bachelor Thesis

48

[15] D. Bitouk, N. Kumar, S. Dhillon, P. Belhumeur, and S. K. Nayar, Face swapping:

automatically replacing faces in photographs, Aug. 2008.

[16]V. Blanz, C. Basso, T. Poggio, and T. Vetter, Reanimating faces in images and video,

Computer Graphics Forum, vol. 22, no. 3, 2003.

[17] V. Blanz, K. Scherbaum, T. Vetter, and H.-P. Seidel, Exchanging faces in images,

Computer Graphics Forum, vol. 23, no. 3, pp. 669–676, 2004.

[18] I. Kemelmacher-Shlizerman, A. Sankar, E. Shechtman, and S. M. Seitz, Being john

malkovich”, 2010,

[19] F. Yang, J. Wang, E. Shechtman, L. Bourdev, and D. Metaxas, Expression flow for

3d-aware face component transfer, Jul. 2011.

[20] I. Kemelmacher-Shlizerman, E. Shechtman, R. Garg, and S. M. Seitz, Exploring

photobios, ACM Trans. Graph., vol. 30, no. 4, pp. 61:1–61:10, Jul. 2011.

[21] K. Dale, K. Sunkavalli, M. K. Johnson, D. Vlasic, W. Matusik, and H. Pfister, Video

face replacement, ACM Trans. Graph., vol. 30, no. 6, pp. 130:1–130:10, Dec. 2011.

Page 50: Final Report Bachelor Thesis

49

Fig. A.1.1. – Scanning of a scene with Kinect

Appendix

A.1 Data Capture: What is Kinect Fusion? [2] [3]

Kinect Fusion is an algorithm for 3D object scanning. It reconstructs a single dense

surface model by integrating the depth data from Kinect over time from multiple

viewpoints.

By moving the Kinect sensor around an object or a scene, it gets scanned. The quality of

the registration depends on the points of view that are recorded with the sensor. The

more points of view are shown, the more information is the final reconstruction going to

contain.

Fig.A.1.1 shows an example of how

the final reconstruction’s quality varies

according to the trajectory of the

sensor around the scene. With the first

trajectory the quality of the final

reconstruction is very bad whereas in

the last example the quality is much

better.

Fig.A.1.2 shows the complete pipeline of the Kinect Fusion algorithm. Each frame

corresponds to a new viewpoint. Each new frame is fused into the final reconstructed

volume. Fig. A.1.2. – Kinect’s Fusion pipeline

Page 51: Final Report Bachelor Thesis

50

Raw depth: multiple 2D images from multiple viewpoints. Each pixel’s value corresponds

to a depth. The depth sensor Kinect obtains the depth value of each pixel of the image.

Each frame corresponds to a point of view.

a) Depth Map Conversion: The normal map is obtained for each depth map. This enables

to know the shape of the surface.

b) Camera Tracking: The camera pose is tracked as the sensor is moved (its location and

orientation). The camera pose is obtained for each Depth Map (each frame).

c) Volumetric Integration: The multiple viewpoints of the recorded scene can be fused

because each frame’s pose is known and how it relates to the others. A single

reconstruction voxel volume is obtained at the end.

d) Raycasting: Converting the volumetric integration into a single surface.

Fig.A.1.3 summarizes the pipeline of the algorithm. It includes also one of our examples

after using Kinect Fusion on one of our subjects.

Fig. A.1.3. – Kinect’s Fusion diagram