Human Face Shape Analysis under Spherical …gravis.dmi.unibas.ch/publications/2013/ICB2013-Zivanov...Human Face Shape Analysis under Spherical Harmonics Illumination Considering Self

Human Face Shape Analysis under Spherical Harmonics IlluminationConsidering Self Occlusion

Jasenko Zivanov Andreas Forster Sandro Schonborn Thomas [email protected] [email protected] [email protected] [email protected]

Department of Mathematics and Computer ScienceUniversity of Basel

4056 Basel, Switzerland

Abstract

In this paper we present a novel method of consider-ing self-occlusion when fitting faces under global illumi-nation. The effects of self-occlusion under spherical har-monics (SH) illumination are predicted through the use of aa morphable model of faces. A linear function is precom-puted which maps morphable model coordinates directly tosurface irradiance values.

We demonstrate that considering self-occlusion yields aclear improvement over an otherwise identical reconstruc-tion where self-occlusion is neglected. Our method suc-ceeds at reconstructing facial attributes that are visible pri-marily as a consequence of self-occlusion, such as the depthof the eye sockets and the shape of the nose.

1. IntroductionWe aim to estimate the shape of a face from a sin-

gle photograph under unknown arbitrary illumination andan unknown albedo while considering the effects of self-occlusion. Since this is an ill posed problem, we apply astatistical model of human faces to solve the arising am-biguities. In those models, a specific shape or albedo mapis represented by a small number of real-valued coefficients.The small number of those coefficients allows us to estimatethem in a very stable way.

The illumination is modelled as a continuous function ona sphere which describes the amount of light incoming fromany given spatial direction. Since faces seen in photographsare often illuminated from many different angles, this al-lows us to express a wide range of illumination conditionsencountered in the real world. We represent the illuminationfunction as a linear combination of real spherical harmon-ics (SH) basis functions. This provides us with another lowdimensional model, this time one of incoming illumination.

Because under continuous SH illumination light can ir-

radiate any vertex from any direction, computing the effectsof self-occlusion becomes computationally very demand-ing. The novelty of our approach is that we propose a linearapproximation of the effects of self-occlusion that is cou-pled with our morphable model. This makes the problemtractable on today’s computing machines and it allows us toconsider global self-occlusion in our shape extraction.

Our illumination model is limited to soft and distantlighting, i.e. the illumination function is smooth over theincoming direction and the amount of light incoming fromany given direction is the same across the entire face.

1.1. Previous Work

A first simple shape prior for facial image analysis wasproposed by Vetter and Blanz in 1999. They define a low-dimensional generative statistical model of human faces anduse it to perform shape from shading constrained by thatmodel. Their method assumes a minimal single source illu-mination model and it consists of an analysis by synthesisoptimization loop. In 2005, Romdhani [1] improved uponthat estimation technique by introducing a separate specularhighlight term and an edge correspondence term to the costfunction. In 2009, Knothe [2] proposed a novel morphablemodel, the global-to-local model (G2L), which aims for abetter spatial separation of the components of the model.

On the computer graphics side, Ramamoorthi [3] was thefirst to propose using real-valued spherical harmonics (SH)functions to model the illumination in a scene in his semi-nal paper from 2000. That approach was extended by Sloanand Kauz [4] in 2003, when they presented a method of pre-computing the effects of self occlusion under SH lighting.In 2006, Zhang and Samaras [5] proposed an application ofSH lighting to the machine vision context, namely to theproblem of face recognition.

In 2011, Kemelmacher-Schlizerman and Basri [6] pro-posed an application of SH illumination to facial shape fromshading. They use only one single mean shape in place of

1

__________________________________________________ICB-2013, 6th International Conference on Biometrics

________________________________________________________ _________________________________________________

ICB-2013 June 4-7, 2013 Madrid, Spain

an entire morphable model. That mean shape is deformedin order to explain the shading observed in the image. Theirmethod does not consider the effects of self occlusion.

In 2012, Elhabian et al. [7] presented another shape fromshading algorithm based on SH illumination which usesa more realistic Torrance-Sparrow reflection model, alsowithout considering self-occlusion.

Most recently, Aldrian and Smith [8] have proposed amethod for the inverse rendering of faces that relies on alinear model of facial self-occlusion. They do not aim to re-construct the shape from the observed shading. Their modelassumes a constant incoming light distribution which corre-sponds to considering only the first SH basis function.

1.2. Contribution

In this paper, we propose a method of fitting a morphablemodel of human faces to a single image under omnidi-rectional SH illumination while considering self-occlusion.The use of a morphable model allows us to precompute theeffects of self-occlusion and thus consider them in the fit-ting process.

Without self-occlusion, the radiance of a surface pointdepends only on the surface normal at that point, even un-der omnidirectional illumination. If self-occlusion is con-sidered, however, the evaluation of the radiance suddenlybecomes computationally very expensive, because for eachsurface point P and every incoming light direction ω, theentire facial geometry needs to be looked at, and it needsto be determined whether direction ω is occluded at pointP . That amount of computation makes determining self-occlusion intractable in any iterative shape fitting scheme.

We encode the self-occlusion information in the dimen-sions of the morphable model itself, and thereby avoid hav-ing to determine that information at runtime. This providesus with a linear function which maps shape model coordi-nates directly to surface radiance values under a given setof illumination coefficients. Inverting this linear functionallows us to predict a shape that implies the self-occlusioneffects observed in the input image.

Note that the actual radiance of a face under a given il-lumination is not a linear function of vertex displacements.We will demonstrate, however, that a linear function is agood approximation for the small amount by which corre-sponding vertices of a human face shift between differentfaces.

2. TheoryShape The shapes of our faces are represented through alinear morphable model. The morphable model defines theposition sv ∈ R3 of each vertex v as follows,

sv = sµv +

N∑i=0

qSi σSi s

iv, (2.1)

where sµv ∈ R3 is the mean position of vertex v and siv ∈ R3

is its displacement as a result of each component i of themorphable model. The coefficients qSi ∈ R are the modelcoordinates of the head while the values σSi ∈ R are thestandard deviations of the corresponding model compo-nents. The superscript S denotes that we are referring toshape model coordinates and standard deviations, in orderto distinguish them from the albedo coordinates and stan-dard deviations which will be introduced later.

Shading We assume a global illumination function whichis modulated individually for each vertex. We refer to themodulation coefficients as the generalized vertex irradiancewhich is predicted from model coordinates using a linearapproximation, the local irradiance model.

Our illumination model is defined by an illuminationfunction L(ω) which maps directions on a sphere ω ∈ S2

to RGB light values L ∈ RGB. The space RGB is equiv-alent to a triplet of positive real numbers. The illuminationfunction L(ω) is projected into SH space,

L(ω) =

B∑l=0

l∑m=−l

λml Yml (ω) =

K∑k=0

λkYk(ω), (2.2)

where Y ml are the real SH basis functions and λml ∈ RGBare the illumination coefficients which characterize a spe-cific illumination environment. Throughout this project, weused three bands of SH (B = 2), which correspond to ninebasis functions Y ml (ω). To improve legibility, we will referto combinations of indices l and m as a combined index kthroughout this paper.

The radiance Rv of a surface vertex v is equal to theintegral over the light contributions from all unoccluded di-rections ω,

Rv(ωout) = (2.3)∫S2Vv(ω) max(0, cos(ω, nv))F (ω, ωout)L(ω)dω,

where Rv(ωout) ∈ RGB is the exitant radiance at v in di-rection ωout ∈ S2, Vv(ω) ∈ {0, 1} is the visibility function,which is one for visible directions ω and zero for occludedones, F ∈ RGB is the bidirectional reflectance distributionfunction (BRDF) and L is our illumination function. Thisexpression is equal to Kajiya’s rendering equation [9], ex-cept that the effects of interreflection are neglected.

We assume Lambertian reflectiveness, so our BRDF isequal to a constant albedo Av ∈ RGB. By applying ourdefinition of L (eq. 2.2), we can decompose the radiance asfollows,

Rv = Avρλv , (2.4)

ρλv =K∑k=0

λkρvk, (2.5)

2


________________________________________________________ _________________________________________________


ρvk =

∫S2Vv(ω) max(0, cos(ω, nv))Yk(ω)dω. (2.6)

We refer to ρvk ∈ R as the generalized vertex irradiances,which measure the light contribution by each of the SH ba-sis functions Yk to each vertex v. They are independentof the illumination coefficients λ. The terms ρλv ∈ RGBare referred to as the specific vertex irradiances, and theydescribe the actual irradiance of vertex v under a specificillumination λ.

In the following, we will drop the subscript v when werefer to sets of quantities for all the vertices in the mesh, andthe subscript k when we refer to the values associated withall nine SH basis functions.

3. SetupLocal Irradiance Model We use a linear local irradiancemodel Φ(qS) to predict the generalized vertex irradiancesρ from morphable model coordinates qS ∈ RN . Φ(qS) isinverted in our algorithm to find optimal morphable modelcoordinates. It is defined as follows,

Φ : RN → RKNv

qS 7→ ρµΦ +N∑i=0

qSi ρiΦ ≈ ρ(qS). (3.1)

The mean generalized irradiance ρµ is defined as thegeneralized irradiance of the mean shape sµ. The irradiancedeformations ρi are defined as the irradiances of the unitshapes in the linear shape model minus ρµ. A unit shape isa shape generated by equation 2.1 where all entries of qS

are zero, except for one which is equal to one.The computation of the generalized irradiance of a shape

is performed by evaluating the integral in equation 2.6 us-ing Monte Carlo integration. This Monte Carlo integrationconsists of rendering many orthographic shadow maps ofthe head from many viewing directions ω in order to obtainthe values of the visibility function Vv(ω) and then addingup the resulting values of the integrand.

At runtime, once a set of illumination coefficients λi hasbeen estimated, the local irradiance model Φ is projectedinto that illumination, yielding the specific local irradiancemodel Φλ. This is done by applying equation 2.5 to ρµΦ andall ρiΦ.

Albedo Model Second, we require an illumination-freealbedo model. For that purpose we have determined thealbedo maps of 200 heads of known shape by computingtheir generalized irradiance through Monte Carlo integra-tion. We have then estimated the illumination coefficientsλ and then divided the observed vertex radiances by theirimplied irradiances. Finally, we have constructed a PCA

model of those 200 albedo maps. That model is very simi-lar to the shape model. It is defined as follows,

Av = Aµv +

NA∑i=0

qAi σAi A

iv, (3.2)

where Aµv is the mean albedo of vertex v and Aiv are theprincipal components of the albedo model for v.

Initial Shape and Pose Our method is initialized by per-forming a landmark fit in order to estimate the correspon-dence between the vertices of the morphable model and thepixels of the image. That landmark fit consists of finding thepose of the face and an initial shape that maps a number ofgiven landmarks to their known 2D positions in the image.The fit is performed using the L-BFGS optimization algo-rithm as described by Knothe [2]. No illumination model isused during this intialization step.

The pose is given by a 4 × 4 view matrix W , such that(sx, sy) is the image position corresponding to world spaceposition s, with,

sx = x0 +sxsw, sy = y0 +

sysw, s = W

sxsysz1

.

(3.3)

4. Method

Our method aims to estimate a new set of shape coor-dinates qS′ which better explain the shading in the inputimage than the initial shape coordinates qS . For that pur-pose, we also require estimates of the illumination coeffi-cients λml and albedo coefficients qAi .

Our Algorithm consists of the following three steps:

1. Illumination Fit

2. Albedo Fit

3. Shape Fit

All three steps are linear least-squares problems. They aresolved by performing an alternating least squares optimiza-tion. The illumination fit is repeated after the albedo fitsince it is far quicker than the albedo fit or the shape fit.We have found through our experiments that only negligiblechanges to the shape take place after only three iterations ofthe above four steps.

As initial values, we assume the mean albedoAµ and thegeneralized irradiance corresponding to the initial shape,i.e. Φ(qS).

3


________________________________________________________ _________________________________________________


4.1. Illumination

We begin by estimating the illumination coefficients.This is done by solving a linear system of equations in aleast squares sense.

The illumination coefficients λk are determined by solv-ing the following set of equations in a least squares sensefor all valid vertices v,

Rv = Aµv

K∑k=0

λkρvk, (4.1)

where Rv are the vertex radiances observed in the image. Avertex is considered valid for the illumination fit if it is vis-ible and if its variance in both albedo and generalized irra-diance is below certain threshold values σAt and σρt . Thosevariances have been determined from the same set of 200example heads also used for the computation of the albedomodel and the linear local irradiance model Φ.

4.2. Albedo

Once an initial estimate of the illumination has been de-termined, we proceed to approximate the albedo. This isdone using our albedo model. Although the albedo modelis defined for all vertices in the face, the albedo fit is per-formed multiple times, once for each facial segment. Thesegments define the following facial areas: skin, eyes, eye-brows, nostrils and lips. Breaking up the albedo model inthis way increases its expressiveness, since the albedo mapsof those segments can then be fitted independently of eachother.

The albedo estimation consists of solving the followinglinear least squares problem for the albedo model coeffi-cients qA,gi in each segment g,

wgv(Rv − µAv ρλv ) = wgvρλv

NA∑i=0

qA,gi σAi uAv,i, (4.2)

0 = τAqAi , (4.3)

where the weight wgv ∈ R describes the amount to whichvertex v belongs to segment g. The sum of wgv over all seg-ments is equal to 1 at each vertex. The first term providesan equation in RGB for each valid vertex v and it aims toexplain the colors observed in the image. A vertex is con-sidered valid for the albedo fit if its variance in generalizedirradiance lies below the treshold variance σρt . This elim-inates vertices of high geometric uncertainty. The secondterm is a regularization term and it provides NA equations,one for each albedo model dimension, and it is intended toenforce a solution which is considered likely by the model.The number τA is a regularization constant and it allowsfor a trade-off between reconstruction precision and modelprior.

The estimated coefficients are used to evaluate the vertexalbedo values Agv ∈ RGB for each segment using equation3.2. They are then interpolated using the segment weightswgv to obtain the final vertex albedo values Av .

4.3. Shape

Here, we invert the local irradiance model Φ(qS) definedin equation 3.1 in order to obtain a set of shape coordinatedisplacements δqi so that the displaced shape space coordi-nates qS′ = qS + δq correspond to a mesh that better ex-plains the shading in the input image. At the same time, wealso need to preserve the model-to-image correspondenceestablished by the initial landmark fit. This is accomplishedby requiring the image plane shift of certain vertices to beclose to zero through an additional correspondence term.

The shape fit consists of solving the following linear sys-tem of equations for δqi,

Rv −R0v =

N∑i=0

ρiΦvδqi, (4.4)

0 = τCwCv

N∑i=0

WσSi sivδqi, (4.5)

τSq0i = −τSδqi. (4.6)

The first term (4.4) represents one equation in RGB foreach vertex, or three equations per vertex in R. Its pur-pose is to find a shape space shift δq that better explainsthe observed shading. The left hand side describes the colorchange required for vertex v, while the right hand side de-scribes the color change as a function of δqi. R0

v is thevertex radiance implied by the current shape under the esti-mated illumination and albedo.

The second term (4.5) is referred to as the correspon-dence term and it represents two further equations in R foreach vertex. It aims to limit the amount by which vertexv shifts in the image plane as a result of δqi. Since thelateral positions of all vertices are provided only by the ini-tial correspondence fit, we need to take care not to distortthat initial correspondence. The collective correspondenceweight τC allows us to form a tradeoff between shadingreconstruction accuracy and correspondence preservation.The individual weights wCv are equal to one for all land-mark vertices and zero for all others. Since the correspon-dence has only been established at the landmarks, all othervertices are allowed to move laterally in the image plane.The matrix W is the upper left 2× 3 submatrix of the viewmatrix W , as defined in equation 3.3.

The final term represents a set of N regularization equa-tions, and it serves to keep the resulting shape space co-ordinates q1

i = q0i + δqi close to zero, in order to ensure a

resulting shape that is close to the mean and thus consideredlikely by the morphable model.

4


________________________________________________________ _________________________________________________


5. ExperimentsIn the following, we will describe a number of experi-

ments performed to evaluate our method.The first part, section 5.1, serves to validate the approxi-

mation of facial irradiance as a linear function of morphablemodel coordinates. In the second part, section 5.2, we testthe ability of the complete algorithm to recover the shapefrom images 5.2.

5.1. Irradiance Mapping Test

We first prove the validity of our local irradiance modelΦ(qS). For that purpose we work on a set of 60 test facessampled from the morphable model according to its statis-tical distribution.

Let qS ∈ RN a set of coefficients corresponding to onetest face. The vertex positions sv ∈ R3 of all heads havebeen computed as shown in equation 2.1. For each head,its explicit generalized irradiance ρref has been determinedusing Monte Carlo integration.

The purpose of this first experiment is to prove that thegeneralized irradiance ρ can be approximated as a linearfunction of shape coordinates qS using our mapping func-tion φ. The irradiance estimates ρest of all 60 test faces havebeen determined using φ and compared to the explicit ir-radiances ρref of the heads. We have defined the followingreconstruction error:

e(ρ) =

Nv∑i=0

|ρv − ρrefv |, (5.1)

where | · | is the euclidean distance in RK , the space of SHcoefficients, and Nv is the number of vertices in the mesh.

The reconstruction error e(ρest) has then been comparedto the error e(ρµφ) implied by using the mean irradiance ρµφas an estimate. The error of the mean is on average 6.86times greater than the error of the reconstruction. A his-togram of the results is shown in fig. 1.

5.2. Shape Recovery

For the following experiment, we have generated render-ings of 17 human heads under four light probes as illumina-tion environments. The reflectance has been assumed Lam-bertian and self-occlusion has been considered by means ofMonte Carlo integration. The 2D positions of the same setof anchor vertices as above have then been used as land-marks to determine pose and an initial set of morphablemodel coefficients. The shape corresponding to those co-efficients serves as the input to our algorithm.

We have also constructed two versions of the irradiancefunction Φ, one that considers self-occlusion and one thatdoes not. The algorithm has been applied to all input heads,once with the original Φ function, and once with a versionwhere self-occlusion has been neglected.

0 0.5 1 1.5 20

1

2

3

4

5

6

7

8

9

10

Local Irradiance Model

avg. irradiance error (arbitrary units)

nu

mb

er

of

he

ad

s

mean shape irradiancelinear approximation

Figure 1: Histograms of the generalized irradiance errorover 60 heads. We compare the error of our linear irradiancemodel (green) to the error when assuming the irradiance ofthe mean head (red). Note that our linear model provides afar better approximation than the mean irradiance. The irra-diance can not be measured in any meaningful units, sinceit consists of arbitrary RK values.

The following parameters have been used: τA = 20,τC = 0.1, τS = 0.5 and K = 9. The colors have beendefined in the range [0, 1], with (0, 0, 0) being black and(1, 1, 1) being white. Distances were measured in µm. Theface mesh was composed of Nv = 33, 334 vertices. Thealbedo model consisted of NA = 200 components and theshape model of N = 300 components.

The resulting shapes have then been compared to theground truth shapes in both the depth maps and the nor-mal maps as seen by the camera. For the depth comparison,we have centered each depth map at its mean depth, sincethe absolute depth can not be inferred precisely from a pho-tograph. For the depth comparison, we have compared themean absolute values of the per-pixel depth differences ofboth centered depth maps. For the orientation comparison,we have measured the mean absolute value of the angle be-tween the normals at corresponding pixels of the normalmaps. The comparison was limited to skin pixels, i.e. theeyes, nostrils, mouths and eyebrows have been neglected.

without self-occlusion with self-occlusiondepth (mm) 2.823 2.542

angle (◦) 9.621 8.643

Table 1: Mean reconstruction errors. The values have beenaveraged over all relevant vertices of all 17 faces under allfour illuminations.

We have found that for synthetic images, the surface ori-entation of the resulting shapes exhibits a clear improve-ment when self-occlusion is considered. The depth error isalso reduced, albeit to a lesser degree. The results are dis-played in fig. 2 and table 1.

We have also reconstructed a number of faces from real

5


________________________________________________________ _________________________________________________


−0.5 0 0.5 1 1.5 2 2.5 30

1

2

3

4

5

6

7

8

9

10

degrees

he

ad

s

Angular Reconstruction Improvement

−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.20

2

4

6

8

10

12

14

mm

he

ad

s

Depth Reconstruction Improvement

Figure 2: Histograms of reconstruction improvement whenself-occlusion is considered.

photographs from the AFLW database [10]. The results areshown in fig. 4.

6. Summary and Conclusions

We have demonstrated a novel method of extracting theshape of a face from a single photograph under unknownillumination and albedo that considers self-occlusion. Ourmethod is able to predict the effects of self-occlusion usingself-shadowing information encoded in the dimensions ofa morphable model of faces. It consists of a sequence oflinear least-squares problems each of which can be solvedwithin less than one minute on contemporary hardware,even though the effects of self-occlusion are considered.We have shown that our method yields a significant im-provement over a reconstruction that does not consider self-occlusion.

Future work in this area will need to find a way ofcombining our method of considering self-occlusion withmore expressive, non-Lambertian reflection models. Thiswill probably also mean that the linearity of our approachwill have to be sacrificed, since the high-frequency natureof specular reflections makes them difficult to represent asglobal functions of shape.

References

[1] S. Romdhani and T. Vetter, “Estimating 3d shapeand texture using pixel intensity, edges, specular high-lights, texture constraints and a prior”, in ComputerVision and Pattern Recognition, 2005. CVPR 2005.

IEEE Computer Society Conference on. IEEE, 2005,vol. 2, pp. 986–993.

[2] R. Knothe, A global-to-local model for the repre-sentation of human faces, PhD thesis, University ofBasel, 2009, http://edoc.unibas.ch/diss/DissB_8817.

[3] R. Ramamoorthi and P. Hanrahan, “An efficient rep-resentation for irradiance environment maps”, in Pro-ceedings of the 28th annual conference on Computergraphics and interactive techniques. ACM, 2001, pp.497–500.

[4] P.P. Sloan, J. Kautz, and J. Snyder, “Precomputed radi-ance transfer for real-time rendering in dynamic, low-frequency lighting environments”, in ACM Transac-tions on Graphics (TOG). ACM, 2002, vol. 21, pp.527–536.

[5] L. Zhang and D. Samaras, “Face recognition from asingle training image under arbitrary unknown light-ing using spherical harmonics”, Pattern Analysis andMachine Intelligence, IEEE Transactions on, vol. 28,no. 3, pp. 351–363, 2006.

[6] Ira Kemelmacher-Shlizerman and Ronen Basri, “3dface reconstruction from a single image using a singlereference face shape”, IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 33, pp. 394–405, 2011.

[7] S. Elhabian, E. Mostafa, H. Rara, and A. Farag, “Non-lambertian model-based facial shape recovery fromsingle image under unknown general illumination”,in Proceedings of the Ninth Conference on Computerand Robot Vision (CRV), 2012.

[8] O. Aldrian and W. Smith, “Inverse rendering of faceson a cloudy day”, Computer Vision–ECCV 2012, pp.201–214, 2012.

[9] J.T. Kajiya, “The rendering equation”, ACM SIG-GRAPH Computer Graphics, vol. 20, no. 4, pp. 143–150, 1986.

[10] M. Kostinger, P. Wohlhart, P.M. Roth, and H. Bischof,“Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark lo-calization”, in Computer Vision Workshops (ICCVWorkshops), 2011 IEEE International Conference on.IEEE, 2011, pp. 2144–2151.

6


________________________________________________________ _________________________________________________


Figure 3: Faces extracted from synthetic images demonstrating the effect of considering self-occlusion. Input image (left),reconstructions without and with considering self-occlusion (middle) and the angular errors of the two reconstructions (right).

7


________________________________________________________ _________________________________________________


Figure 4: Faces extracted from real photographs from the AFLW database. Input image (left) and reconstruction (right).

8


________________________________________________________ _________________________________________________


Documents

Human Face Shape Analysis under Spherical …gravis.dmi.unibas.ch/publications/2013/ICB2013-Zivanov...Human Face Shape Analysis under Spherical Harmonics Illumination Considering Self