1
Example 1 Generating Random Photorealistic Objects Umar Mohammed and Simon Prince {u.mohammed,s.prince}@cs.ucl.ac.uk Department of Computer Science, University College London, Gower Street, London, UK, WC1E 6BT Animating complex moving objects such as humans is a difficult task. This is typically done by creating a 3D model of the character and then animating it using motion capture data. Often an animator has to model many frames by hand to allow for more control over the animation. Despite this complex procedure the resulting characters look unrealistic and lack expression. The figures on the right demonstrate how state of the art graphics used in computer games are far from being photorealistic. We propose to solve these two problems, by directly animating characters from video footage. We aim to build a generative model of human motion data which will be trained with videos of human characters performing actions. We can then synthesize new data by generating from the model. This is a very difficult problem and has many challenges. As an intermediate goal we solve a simpler, related problem of generating photorealistic examples of static data such as faces. Introduction A Global Generative Model For Faces One method of generating random photorealistic faces is to build a generative model of face data. New face images are synthesized by generating from the model. We describe observed face data using a factor analysis model. A face image x i assumed to have been generated from a point h i in a lower dimensional ‘face space’ by a noisy process. The factor analysis model is given by: Where F is a factor matrix containing the basis vectors of the ‘face space’, is the mean of the training data, and is a Gaussian noise term with 0 mean and diagonal covariance . The figure below shows an example of a face generated from the factor analysis model by creating a random vector h i and applying the deterministic transform described in the equation above. These faces are unrealistic since they only exhibit global consistency and the local texture contains many artefacts. This is due to the global factor analysis model being learnt on whole images and not learning the local texture present within them. i i i Fh x We can synthesize faces which have local consistency using a non parametric method similar to [1]. This is done by taking overlapping patches from a set of training faces. We then build a library of patches for each location. To synthesize a face we take patches from each library location ensuring that they match with their neighbours, this process is shown in the figures below. The synthesized image has the correct local texture exhibited by faces. However there is no global consistency. A parametric local model for faces The fields of experts model [2] is a parametric model for the local texture of natural images. It models images using a products-of- experts[3] based framework where a high dimensional probability distribution is modelled by taking the product of several low dimensional experts. Each expert works on a low dimensional subspace of the data which is easy to model. Since the marginal distribution of responses from linear filters applied to natural images are highly kurtoic in nature[4], each expert is modelled as a student-t distribution. The probability of an image x under the fields of experts model is given by: Where J i is a matrix containing the filters of the distributions, x k is a patch from the image, i is the mean of the distribution and i is the sharpness of the distribution. The products of experts model for the two dimensional case is shown below, where 3 clusters are modelled with a product of 3 experts. This framework is used to learn the local texture information of faces by taking 20000 15x15 random patches from a set of 50 training images. 5x5 overlapping sub-patches are then taken from the larger patches and a model with 24 distributions is learnt. The result of Denoising and Inpainting images with this model is shown below. Library of Patches Take overlapping patches Training Images Local Non-Parametric Model and Global Model Faces which are both globally and locally consistent are generated by combining the global factor analysis model with the non-parametric local method. First a face is generated from the global model, local consistency is ensured by taking overlapping patches from the library such that they are similar at the boundaries and similar to the generated global face underneath. This is shown in the figure below. k i i k T i i p ) ) ( 2 1 1 ( ) ( 2 x J x Local parametric model and global model We can combine the local Fields of Experts model with the global factor analysis model to form a parametric generative model which is both globally and locally consistent. The log likelihood of an image x under this combined model is given as Where the first term is the log likelihood of an image under the FoE model, the second term is the log likelihood under the factor analysis model and is an arbitrary weighting constant. The images below show the result of generating from this model with varying values of . ) , | ( log ) , , | ( log ) ( log F x J x x p p p References [1] A Efros and W Freeman. Image quilting for texture synthesis and transfer. SIGGRAPH, 341-346, 2001 [2] S Roth and M Black. Fields of experts: A framework for learning image priors. CVPR, 860--867, 2005 G. E. Hinton. Training products of experts by minimizing contrastive divergence. Technical Report GCNU TR 2000-004, Gatsby Computational Neuroscience Unit, University College London, 2000 TRAINING SYNTHESIZING = Generated Global Image Synthesized using joint method Where the patches came from Closest image in training set Noisy Image Original Image Result Original Image Masked Image Result = 0 = 0.3 = 0.6 = 1 Image is blurred at edges Artefacts around eyes EXAMPLES Library of Patches Synthesizing Resulting Image = 2 = 3 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 1 st Expert 3 rd Expert 2 nd Expert Product of Experts Denoising Inpainting Where the patches came from Example 2 Where the patches came from Example 3 Example 4 A Local Non-Parametric Model For Faces Example 5 Example 6

Example 1 Generating Random Photorealistic Objects Umar Mohammed and Simon Prince {u.mohammed,s.prince}@cs.ucl.ac.uk Department of Computer Science, University

Embed Size (px)

Citation preview

Page 1: Example 1 Generating Random Photorealistic Objects Umar Mohammed and Simon Prince {u.mohammed,s.prince}@cs.ucl.ac.uk Department of Computer Science, University

Example 1

Generating Random Photorealistic ObjectsUmar Mohammed and Simon Prince

{u.mohammed,s.prince}@cs.ucl.ac.ukDepartment of Computer Science, University College London,

Gower Street, London, UK, WC1E 6BT

Animating complex moving objects such as humans is a difficult task. This is typically done by creating a 3D model of the character and then animating it using motion capture data. Often an animator has to model many frames by hand to allow for more control over the animation. Despite this complex procedure the resulting characters look unrealistic and lack expression. The figures on the right demonstrate how state of the art graphics used in computer games are far from being photorealistic.

We propose to solve these two problems, by directly animating characters from video footage. We aim to build a generative model of human motion data which will be trained with videos of human characters performing actions. We can then synthesize new data by generating from the model. This is a very difficult problem and has many challenges. As an intermediate goal we solve a simpler, related problem of generating photorealistic examples of static data such as faces.

Introduction

A Global Generative Model For FacesOne method of generating random photorealistic faces is to build a generative model of face data. New face images are synthesized by generating from the model. We describe observed face data using a factor analysis model. A face image xi assumed to have been generated from a point hi in a lower dimensional ‘face space’ by a noisy process. The factor analysis model is given by:

Where F is a factor matrix containing the basis vectors of the ‘face space’, is the mean of the training data, and is a Gaussian noise term with 0 mean and diagonal covariance . The figure below shows an example of a face generated from the factor analysis model by creating a random vector hi and applying the deterministic transform described in the equation above.

These faces are unrealistic since they only exhibit global consistency and the local texture contains many artefacts. This is due to the global factor analysis model being learnt on whole images and not learning the local texture present within them.

iii Fhx

We can synthesize faces which have local consistency using a non parametric method similar to [1]. This is done by taking overlapping patches from a set of training faces. We then build a library of patches for each location. To synthesize a face we take patches from each library location ensuring that they match with their neighbours, this process is shown in the figures below. The synthesized image has the correct local texture exhibited by faces. However there is no global consistency.

A parametric local model for facesThe fields of experts model [2] is a parametric model for the local texture of natural images. It models images using a products-of-experts[3] based framework where a high dimensional probability distribution is modelled by taking the product of several low dimensional experts. Each expert works on a low dimensional subspace of the data which is easy to model. Since the marginal distribution of responses from linear filters applied to natural images are highly kurtoic in nature[4], each expert is modelled as a student-t distribution. The probability of an image x under the fields of experts model is given by:

Where Ji is a matrix containing the filters of the distributions, xk is a patch from the image, i is the mean of the distribution and i is the sharpness of the distribution. The products of experts model for the two dimensional case is shown below, where 3 clusters are modelled with a product of 3 experts.

This framework is used to learn the local texture information of faces by taking 20000 15x15 random patches from a set of 50 training images. 5x5 overlapping sub-patches are then taken from the larger patches and a model with 24 distributions is learnt. The result of Denoising and Inpainting images with this model is shown below.

Library of Patches

Take overlapping patches

Training Images

Local Non-Parametric Model and Global ModelFaces which are both globally and locally consistent are generated by combining the global factor analysis model with the non-parametric local method. First a face is generated from the global model, local consistency is ensured by taking overlapping patches from the library such that they are similar at the boundaries and similar to the generated global face underneath. This is shown in the figure below.

k i

ikTi

ip ))(2

11()( 2xJx

Local parametric model and global modelWe can combine the local Fields of Experts model with the global factor analysis model to form a parametric generative model which is both globally and locally consistent. The log likelihood of an image x under this combined model is given as

Where the first term is the log likelihood of an image under the FoE model, the second term is the log likelihood under the factor analysis model and is an arbitrary weighting constant. The images below show the result of generating from this model with varying values of .

),|(log),,|(log)(log FxJxx ppp

References[1] A Efros and W Freeman. Image quilting for texture synthesis and transfer. SIGGRAPH, 341-346, 2001

[2] S Roth and M Black. Fields of experts: A framework for learning image priors. CVPR, 860--867, 2005

G. E. Hinton. Training products of experts by minimizing contrastive divergence. Technical Report GCNU TR 2000-004, Gatsby Computational Neuroscience Unit, University College London, 2000

[4] M. Welling, G. Hinton, and S. Osindero. Learning sparse topographic representations with products of Student-t distributions. NIPS 15, pp. 1359--1366, 2003.

TRAINING SYNTHESIZING

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

=

Generated Global Image

Synthesized using joint method

Where the patches came from

Closest image in training set

Noisy ImageOriginal Image Result Original Image Masked Image Result

= 0 = 0.3 = 0.6 = 1

Image is blurred at edges

Artefacts around eyes

EXAMPLES

Library of Patches

Synthesizing Resulting Image

= 2 = 3

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

1st Expert 3rd Expert2nd Expert Product of Experts

Denoising Inpainting

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

Where the patches came from

Example 2 Where the patches came from

Example 3 Example 4

A Local Non-Parametric Model For Faces

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9

Example 5 Example 6

1 2 3 4 5 6 7 8 9

1

2

3

4

5

6

7

8

9