1
Rensselaer Polytechnic Institute 1 University of Science and Technology of China 2 Harbin Institute of Technology 3 Ziheng Wang 1 Yongqiang Li 3 Shangfei Wang 2 Qiang Ji 1 Capturing Global Semantic Relationships for Facial Action Unit Recognition 2. Main Idea 3. Related Work Existing approaches o Treat AUs are uncorrelated entities o Use Bayesian network (BN) to model AU relationships Limitations o Models such as BN are based on the first-order Markov assumption and therefore can only capture local, i.e. pairwise relationships between action units o Finding the optimal structure of a large AU network is difficult o Modeling AU relationships without considering the influence of facial expression, which could lead to incorrect estimation of AU dependencies Proposed Method o Model global AU relationships and consider the influence of expression 7. Experimental Results Methods o SVM, BN [Tong et al. 2007], HRBM, HRBM+ Posed Facial Action Units Recognition o CK+: 593 peak images, 17 action units Non-posed Facial Action Unit Recognition o SEMAIN: 180 image frames, 10 action units o Train on CK+, test on SEMAIN Semantic Relationship Analysis 4. Hierarchical Model for AU Recognition Middle layer to : binary state of 1 to Bottom layer to : image features Top layer to : latent nodes modeling AU relationships Total Energy: o 1 : compatibility between and latent node o : bias for each latent node o : bias for each AU node o 2 : compatibility between and feature Top down – capturing global relations among AUs o Each latent node is connected to all AU nodes and therefore modeling their higher-order relationships o The captured AU relationships can be implicitly inferred from the model parameters 1 o Vector =1 captures a specific presence and absence pattern of all the action units o 1 large more likely to occur in pattern o 1 small less likely to occur in pattern Bottom up: AU recognition from image features o Each AU node is connected to all the image features with the energy , =− 2 o Equivalent to a set of linear AU classification models Discriminative Learning o Given training data , =1 o Calculating the gradient requires , ; and , ; o , ; can be analytically computed o , ; is intractable, we revised contrastive divergence (CD) algorithm to compute it o The basic idea is to approximate , ; by sampling with Equation 1 and then sampling with Equation 2 Inference o Given a query sample , each action unit can be inferred by o Can be efficiently performed with Gibbs sampling by iteratively sampling from (|, ) and sampling from (|, ) 5. Learning and Inference 6. Incorporating Expression to Model AU Relations Relations among the AUs depend on facial expressions o Expression is known during training but known during testing o Basic idea: modulate the connection between each pair of action unit and latent unit ( ,ℎ ) with an expression variable o Model 8. Conclusions Proposed a hierarchical model for AU recognition Capture higher-order AU interactions Consider the influence of facial expression on AU relations Experimental results demonstrate the effectiveness of the proposed approach 1. Problem Facial Action Unit Recognition Pull Lip Corner Raise Cheek Lower Eyebrow Facial action units are NOT independent o AUs do not occur alone and some combinations of action units are frequently observed o Some AUs must or must not be present at the same time due to the limitations of facial anatomy Relationships among AUs are influenced by facial expression o “Stretch mouth” and “raise brow” are likely to be both absent during happiness, both present during surprise, and mutually exclusive during anger or fear Propose to use restricted Boltzmann machine (RBM) to capture global complex relationships among AUs for AU recognition Propose to use 3-way RBM to capture facial expression to more accurately characterize AU relationships Happiness Surprise Anger Fear , , ; =− 1 2 1 1 2 2 2 1 3 2 4 5 1 2 1 2 Top Down: Global semantic relations among facial action units Bottom Up: AU recognition from image measurements = arg max ℒ() = arg max 1 log ( | ; ) =1 = ,; , ; , ; = ; = 1 , ; = 1 2 (1) (2) = arg max a i ( |) 1 1 1 1 , , , ; =− 1 2 3 = arg max log ( , | ; ) =1 = arg max ( |) = arg max ( , |)

Capturing Global Semantic Relationships for Facial Action ...cvrl/zihengwang/papers/iccv2013poster.pdf · QUICK TIPS (--THIS SECTION DOES NOT PRINT--) ... o Equivalent to a set of

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Capturing Global Semantic Relationships for Facial Action ...cvrl/zihengwang/papers/iccv2013poster.pdf · QUICK TIPS (--THIS SECTION DOES NOT PRINT--) ... o Equivalent to a set of

QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)

This PowerPoint 2007 template produces a 36x56 inch

professional poster. You can use it to create your

research poster and save valuable time placing titles,

subtitles, text, and graphics.

We provide a series of online tutorials that will guide

you through the poster design process and answer your

poster production questions.

To view our template tutorials, go online to

PosterPresentations.com and click on HELP DESK.

When you are ready to print your poster, go online to

PosterPresentations.com.

Need Assistance? Call us at 1.866.649.3004

Object Placeholders

Using the placeholders

To add text, click inside a placeholder on the poster and type

or paste your text. To move a placeholder, click it once (to

select it). Place your cursor on its frame, and your cursor

will change to this symbol . Click once and drag it to a

new location where you can resize it.

Section Header placeholder

Click and drag this preformatted section header placeholder

to the poster area to add another section header. Use section

headers to separate topics or concepts within your

presentation.

Text placeholder

Move this preformatted text placeholder to the poster to add

a new body of text.

Picture placeholder

Move this graphic placeholder onto your poster, size it first,

and then click it to add a picture to the poster.

QUICK TIPS (--THIS SECTION DOES NOT PRINT--)

This PowerPoint template requires basic PowerPoint (version 2007 or newer) skills. Below is a list of commonly asked questions specific to this template. If you are using an older version of PowerPoint some template features may not work properly.

Template FAQs

Verifying the quality of your graphics

Go to the VIEW menu and click on ZOOM to set your

preferred magnification. This template is at 100% the

size of the final poster. All text and graphics will be

printed at 100% their size. To see what your poster will

look like when printed, set the zoom to 100% and

evaluate the quality of all your graphics before you

submit your poster for printing.

Modifying the layout

This template has four different

column layouts. Right-click

your mouse on the background

and click on LAYOUT to see the

layout options. The columns in

the provided layouts are fixed and cannot be moved

but advanced users can modify any layout by going to

VIEW and then SLIDE MASTER.

Importing text and graphics from external sources

TEXT: Paste or type your text into a pre-existing

placeholder or drag in a new placeholder from the left

side of the template. Move it anywhere as needed.

PHOTOS: Drag in a picture placeholder, size it first,

click in it and insert a photo from the menu.

TABLES: You can copy and paste a table from an

external document onto this poster template. To

adjust the way the text fits within the cells of a table

that has been pasted, right-click on the table, click

FORMAT SHAPE then click on TEXT BOX and change

the INTERNAL MARGIN values to 0.25.

Modifying the color scheme

To change the color scheme of this template go to the

DESIGN menu and click on COLORS. You can choose

from the provided color combinations or create your

own.

© 2013 PosterPresentations.com 2117 Fourth Street , Unit C Berkeley CA 94710 [email protected]

Student discounts are available on our Facebook page.

Go to PosterPresentations.com and click on the FB icon.

Rensselaer Polytechnic Institute1 University of Science and Technology of China2 Harbin Institute of Technology3

Ziheng Wang1 Yongqiang Li3 Shangfei Wang2 Qiang Ji1

Capturing Global Semantic Relationships for Facial Action Unit Recognition

2. Main Idea

3. Related Work Existing approaches

o Treat AUs are uncorrelated entities

o Use Bayesian network (BN) to model AU relationships

Limitations

o Models such as BN are based on the first-order Markov

assumption and therefore can only capture local, i.e. pairwise

relationships between action units

o Finding the optimal structure of a large AU network is difficult

o Modeling AU relationships without considering the influence of

facial expression, which could lead to incorrect estimation of

AU dependencies

Proposed Method

o Model global AU relationships and consider the influence of

expression

7. Experimental Results

Methods

o SVM, BN [Tong et al. 2007], HRBM, HRBM+

Posed Facial Action Units Recognition

o CK+: 593 peak images, 17 action units

Non-posed Facial Action Unit Recognition

o SEMAIN: 180 image frames, 10 action units

o Train on CK+, test on SEMAIN

Semantic Relationship Analysis

4. Hierarchical Model for AU Recognition

Middle layer 𝒂𝟏 to 𝒂𝒏: binary state of 𝐴𝑈1 to 𝐴𝑈𝑛 Bottom layer 𝒙𝟏 to 𝒙𝒅: image features

Top layer 𝒉𝟏 to 𝒉𝒎: latent nodes modeling AU relationships

Total Energy:

o 𝑎𝑖𝑊𝑖𝑗1ℎ𝑗: compatibility between 𝐴𝑈𝑖 and latent node ℎ𝑗

o 𝑐𝑗ℎ𝑗: bias for each latent node ℎ𝑗

o 𝑏𝑖𝑎𝑖: bias for each AU node 𝑎𝑖

o 𝑊𝑖𝑡2𝑎𝑖𝑥𝑡: compatibility between 𝐴𝑈𝑖 and feature 𝑥𝑡

Top down – capturing global relations among AUs

o Each latent node is connected to all AU nodes and therefore

modeling their higher-order relationships

o The captured AU relationships can be implicitly inferred from

the model parameters 𝑊𝑖𝑗1

o Vector 𝑤𝑖𝑚 𝑖=1𝑛 captures a specific presence and absence

pattern of all the action units

o 𝑊𝑖𝑚1 large → 𝐴𝑈𝑖 more likely to occur in pattern 𝑚

o 𝑊𝑖𝑚1 small → 𝐴𝑈𝑖 less likely to occur in pattern 𝑚

Bottom up: AU recognition from image features

o Each AU node is connected to all the image features with the

energy 𝐸 𝑎𝑖 , 𝑥 = − 𝑊𝑖𝑡2𝑎𝑖𝑥𝑡𝑡

o Equivalent to a set of linear AU classification models

Discriminative Learning

o Given training data 𝒙𝑖 , 𝒂𝑖 𝑖=1𝑁

o Calculating the gradient requires 𝑃 𝒉 𝒂, 𝒙; 𝜽 and 𝑃 𝒉, 𝒂 𝒙; 𝜽

o 𝑃 𝒉 𝒂, 𝒙; 𝜽 can be analytically computed

o 𝑃 𝒉, 𝒂 𝒙; 𝜽 is intractable, we revised contrastive divergence

(CD) algorithm to compute it

o The basic idea is to approximate 𝑃 𝒉, 𝒂 𝒙; 𝜽 by sampling ℎ with Equation 1 and then sampling 𝑎 with Equation 2

Inference

o Given a query sample 𝒙, each action unit can be inferred by

o Can be efficiently performed with Gibbs sampling by

iteratively sampling 𝒉 from 𝑃(𝒉|𝒙, 𝒂) and sampling 𝑎 from

𝑃(𝒂|𝒉, 𝒙)

5. Learning and Inference

6. Incorporating Expression to Model AU Relations

Relations among the AUs depend on facial expressions

o Expression is known during training but known during testing

o Basic idea: modulate the connection between each pair of

action unit and latent unit (𝑎𝑖 , ℎ𝑗) with an expression variable 𝑙

o Model

8. Conclusions

Proposed a hierarchical model for AU recognition

Capture higher-order AU interactions

Consider the influence of facial expression on AU relations

Experimental results demonstrate the effectiveness of the

proposed approach

1. Problem

Facial Action Unit Recognition

Pull Lip Corner Raise Cheek Lower Eyebrow

Facial action units are NOT independent

o AUs do not occur alone and some combinations of action units

are frequently observed

o Some AUs must or must not be present at the same time due to

the limitations of facial anatomy

Relationships among AUs are influenced by facial expression

o “Stretch mouth” and “raise brow” are likely to be both absent

during happiness, both present during surprise, and mutually

exclusive during anger or fear

Propose to use restricted Boltzmann machine (RBM) to capture

global complex relationships among AUs for AU recognition

Propose to use 3-way RBM to capture facial expression to more

accurately characterize AU relationships

Happiness Surprise Anger Fear

𝐸 𝒙, 𝒂, 𝒉; 𝜃 = − 𝑎𝑖𝑊𝑖𝑗1ℎ𝑗

𝑗𝑖

− 𝑐𝑗ℎ𝑗𝑗

− 𝑏𝑖𝑎𝑖𝑖

− 𝑊𝑖𝑡2𝑎𝑖𝑥𝑡

𝑡𝑖

ℎ1 ℎ𝑚 …

𝑎1 𝑎𝑛 …

ℎ2

𝑎2

𝑾

ℎ2

𝑎1 𝑎3 … 𝑎2 𝑎4 𝑎𝑛

𝑎5

𝑾𝟏𝟐

𝑾𝟐𝟐

𝑾𝟑𝟐

𝑾𝟒𝟐

𝑾𝟓𝟐

𝑾𝒏𝟐

𝑎1 𝑎𝑛 … 𝑎2

𝑥1 𝑥𝑑 … 𝑥2

Top Down:

Global semantic

relations among

facial action units

Bottom Up:

AU recognition from

image

measurements

𝒉𝟏 𝒉𝒎 …

𝒂𝟏 𝒂𝒏 …

𝒉𝟐

𝒂𝟐

𝒙𝟏 𝒙𝒅 … 𝒙𝟐

𝜽∗ = argmax𝜽ℒ(𝜽) = argmax

1

𝑁 log𝑃(𝒂𝑖|𝒙𝑖; 𝜽)

𝑁

𝑖=1

𝜕ℒ 𝜽

𝜕𝜽=𝜕𝐸

𝜕𝜽𝑃 𝒉 𝒂,𝒙;𝜽

−𝜕𝐸

𝜕𝜽𝑃 𝒉,𝒂 𝒙;𝜽

𝑃 ℎ𝑗 𝒂, 𝒙; 𝜽 = 𝑃 ℎ𝑗 𝒂; 𝜽 = 𝜎 −𝑐𝑗 − 𝑊𝑖𝑗1𝑎𝑖

𝑖

𝑃 𝑎𝑖 𝒉, 𝒙; 𝜃 = 𝜎 −𝑏𝑖 − 𝑊𝑖𝑗1ℎ𝑗

𝑗

− 𝑊𝑖𝑡2𝑥𝑡

𝑡

(1)

(2)

𝑎𝑖∗ = argmax

ai𝑃(𝑎𝑖|𝒙)

ℎ𝑗

𝑎𝑖

𝑙𝑘 −𝑊𝑖𝑗𝑘𝑎𝑖ℎ𝑗𝑙𝑘

ℎ𝑗

𝑎𝑖

−𝑊𝑖𝑗𝑎𝑖ℎ𝑗

𝑎1 𝑎𝑛

ℎ1 ℎ𝑚 …

… 𝑙1 𝑙𝑚 …

𝑥1 𝑥𝑡 …

𝐸 𝒙, 𝒂, 𝒍, 𝒉; 𝜃

= − 𝑊𝑖𝑗𝑘1 𝑎𝑖ℎ𝑗𝑙𝑘

𝑘𝑗𝑖

− 𝑊𝑖𝑡2𝑎𝑖𝑥𝑡

𝑡𝑖

− 𝑊𝑘𝑡3 𝑙𝑘𝑥𝑡

𝑡𝑘

− 𝑐𝑗ℎ𝑗𝑗

− 𝑏𝑖𝑎𝑖𝑖

− 𝑑𝑘𝑙𝑘𝑘

𝜽∗ = argmax log𝑃(𝒂𝑖, 𝒍𝑖|𝒙𝑖; 𝜽)

𝑁

𝑖=1

𝑎𝑖∗ = argmax𝑃(𝑎𝑖|𝒙) = argmax 𝑃(𝑎𝑖 , 𝒍|𝒙)

𝒍