75
Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines Kihyuk Sohn, Guanyu Zhou, Chansoo Lee, Honglak Lee Dept. of Electrical Engineering and Computer Science University of Michigan

Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

1 / 61

Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines

Kihyuk Sohn, Guanyu Zhou, Chansoo Lee, Honglak Lee

Dept. of Electrical Engineering and Computer Science University of Michigan

Page 2: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

2 / 61

Kihyuk Sohn

• Overview

• Preliminary

• Point-wise Gated Boltzmann Machines

• Experimental results

• Conclusion

Outline

Page 3: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

3 / 61

Kihyuk Sohn

• Unsupervised feature learning (Hinton et al., 2006, Bengio et al., 2007, Ranzato et al., 2007, Bengio, 2009)

– Powerful in discovering representations from unlabeled data.

– However, not all patterns (or data) are equally important.

• When data contains lots of distracting factors, learning meaningful representations can be challenging.

• Feature selection (Jain & Zongker, 1997, Yang & Pedersen, 1997, Weston et al., 2001, Guyon & Elisseeff, 2003)

– Powerful in selecting features from labeled data.

– However, it assumes existence of discriminative features.

• There may not be such features at hand.

Learning from scratch

Page 4: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

4 / 61

Kihyuk Sohn

• Learning features from images for object recognition.

Motivating Example

• Want to learn “person” specific high-level features for good recognition performance.

Page 5: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

5 / 61

Kihyuk Sohn

• Learning features from images for object recognition.

Motivating Example

• Want to learn “person” specific high-level features for good recognition performance.

• There are lots of irrelevant patterns in the background other than person.

Page 6: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

6 / 61

Kihyuk Sohn

• Learning features from images for object recognition.

Motivating Example

• Want to learn “person” specific high-level features for good recognition performance.

• There are lots of irrelevant patterns in the background other than person.

• Class labels may be helpful, though they don’t specify where to focus in the image.

Page 7: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

7 / 61

Kihyuk Sohn

• Learning features from images for object recognition.

Motivating Example

• Want to learn “person” specific high-level features for good recognition performance.

• There are lots of irrelevant patterns in the background other than person.

• Class labels may be helpful, though they don’t specify where to focus in the image.

Q. How can we learn task-relevant high-level features using

(weak) supervision?

We develop a joint model for feature learning and feature

selection.

Page 8: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

8 / 61

Kihyuk Sohn

• Feature learning using class labels

– Convolutional Deep Neural Networks (Lecun et al., Neural Computation 1989; Krizhevsky et al., NIPS 2012, Ciresan et al., Neural Computation 2011, etc.)

– Deep (Belief) Networks (Hinton and Salakhutdinov, Science 2006, Bengio et al., NIPS 2006, Hinton et al., Neural Computation 2006, etc.)

– Discriminative RBMs (Larochelle & Bengio, ICML 2008)

Related Work

Page 9: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

9 / 61

Kihyuk Sohn

• Foreground and background modeling with Boltzmann machines.

– Robust Boltzmann machines (Tang et al., CVPR 2012)

– Masked RBMs (Le Roux et al., Neural Computation 2011; Heess et al., ICANN 2011)

– Our model makes use of class labels and perform generative feature selection while feature learning.

Related Work

Page 10: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

10 / 61

Kihyuk Sohn

w1 w2

Restricted Boltzmann Machines

• Representation

– Undirected bipartite graphical model.

– : binary visible (observed) units.

– : binary hidden units. hidden units

visible units

w3

Page 11: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

11 / 61

Kihyuk Sohn

• Inference

– Efficient and exact due to conditional independence.

– Joint probability can be estimated using Gibbs sampling.

– Posterior can be used as a feature.

• Training: maximum-likelihood.

– Stochastic gradient descent using sampling-based approximation (e.g., contrastive divergence).

Inference and Learning in RBM

Page 12: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

12 / 61

Kihyuk Sohn

Feature Encoding in RBM

… …

hidden layer

visible layer

Samples from variations of MNIST with natural images in the background. (Larochelle et al., ICML 2007)

Page 13: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

13 / 61

Kihyuk Sohn

Feature Encoding in RBM

hidden layer

visible layer

… …

Issues with standard RBMs: 1. RBMs assume all input features are useful (e.g., task-relevant),

but it may not be true in many scenarios.

Page 14: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

14 / 61

Kihyuk Sohn

Feature Encoding in RBM

hidden layer

visible layer

… …

Issues with standard RBMs: 1. RBMs assume all input features are useful (e.g., task-relevant),

but it may not be true in many scenario. 2. Set of useful input features may vary across examples.

Page 15: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

15 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

hidden layer

… visible layer

… …

Page 16: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

16 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

hidden layer

Binary switch variables

… visible layer

Point-wise Gated Boltzmann Machines (PGBM) • Point-wise (or input coordinate-wise) multiplicative interaction

between switch and visible variables.

… …

Page 17: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

17 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

Point-wise Gated Boltzmann Machines (PGBM) • Point-wise (or input coordinate-wise) multiplicative interaction

between switch and visible variables. • Per-visible-unit switch variable (z1,…,zD ; binary) gates the

contribution of each visible variable only when it is useful.

hidden layer

Binary switch variables

visible layer 1 0 0 1 1

… …

Page 18: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

18 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

Point-wise Gated Boltzmann Machines (PGBM) • Point-wise (or input coordinate-wise) multiplicative interaction

between switch and visible variables. • Per-visible-unit switch variable (z1,…,zD ; binary) gates the

contribution of each visible variable only when it is useful.

hidden layer

Binary switch variables

visible layer 1 0 0 1 1

… …

Page 19: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

19 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

Point-wise Gated Boltzmann Machines (PGBM) • Point-wise (or input coordinate-wise) multiplicative interaction

between switch and visible variables. • Per-visible-unit switch variable (z1,…,zD ; binary) gates the

contribution of each visible variable only when it is useful.

hidden layer

Binary switch variables

visible layer 1 0 0 1 1

… …

Page 20: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

20 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

… …

switch variables 𝒛

(red: 1, blue: 0)

visible variables 𝐯

Page 21: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

21 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

… …

switch variables 𝒛

(red: 1, blue: 0)

visible variables 𝐯

Clean input with switch variables

Page 22: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

22 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

… …

switch variables 𝒛

(red: 1, blue: 0)

visible variables 𝐯

Clean input with switch variables

Page 23: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

23 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

… …

switch variables 𝒛

(red: 1, blue: 0)

visible variables 𝐯

Clean input with switch variables

Focus on different set of input features dynamically.

Page 24: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

24 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

… …

• Plate notation

Page 25: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

25 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

• Plate notation

Point-wise Gated Boltzmann Machines (PGBM)

Page 26: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

26 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

• Plate notation

Point-wise Gated Boltzmann Machines (PGBM)

Focus on modeling useful input features

Page 27: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

27 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

• Plate notation

Point-wise Gated Boltzmann Machines (PGBM) • How about the irrelevant patterns?

Focus on modeling useful input features

Page 28: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

28 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

• Plate notation

Point-wise Gated Boltzmann Machines (PGBM) • PGBM models irrelevant patterns using another set of hidden variables.

Focus on modeling useful input features

Page 29: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

29 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

• Plate notation

Point-wise Gated Boltzmann Machines (PGBM) • PGBM models irrelevant patterns using another set of hidden variables. • Modeling irrelevant patterns helps distinguishing relevant patterns from

irrelevant patterns.

Focus on modeling the rest of input features

Focus on modeling useful input features

Page 30: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

30 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

• Plate notation

Point-wise Gated Boltzmann Machines (PGBM) • PGBM models irrelevant patterns using another set of hidden variables. • Modeling irrelevant patterns helps distinguishing relevant patterns from

irrelevant patterns.

Focus on modeling the rest of input features

Focus on modeling useful input features

Page 31: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

31 / 61

Kihyuk Sohn

Point-wise Gated Boltzmann Machines

• Plate notation

Point-wise Gated Boltzmann Machines (PGBM) • Modeling with multiple components than two is also possible.

Page 32: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

32 / 61

Kihyuk Sohn

• Representation

– : binary visible units.

– : binary hidden units.

– : binary switch units.

Point-wise Gated Boltzmann Machine

Page 33: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

33 / 61

Kihyuk Sohn

• PGBM is an unsupervised learning algorithm, and it can only group semantically distinct features with each group of hidden units.

• How to make PGBM to learn discriminative features using class labels?

→ Supervised PGBM

Learning Discriminative Features in PGBM

Page 34: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

34 / 61

Kihyuk Sohn

• Labels are connected to one group of hidden units.

Supervised PGBM

hidden layer

visible and switch layer

label layer

task-relevant hidden units

task-irrelevant hidden units

… …

[Related work: Rifai et al., ECCV 2012]

Page 35: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

35 / 61

Kihyuk Sohn

• Representation

– : binary visible units.

– : binary hidden units.

– : binary switch units.

– : 1-of-L label units.

Supervised PGBM

Page 36: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

36 / 61

Kihyuk Sohn

• Inference

– Mean-field or alternate Gibbs sampling for approximate inference.

– Conditional independence of single type of variables given other variables.

Inference and Learning in PGBM

Page 37: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

37 / 61

Kihyuk Sohn

• Conditional probabilities

Inference and Learning in PGBM

Page 38: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

38 / 61

Kihyuk Sohn

• Conditional probabilities P(h1|-)

Inference and Learning in PGBM

: variables to be sampled.

: variables that are given.

h1 focus on the task-relevant part of the input features that are gated with switch variables.

Page 39: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

39 / 61

Kihyuk Sohn

• Conditional probabilities P(h2|-)

Inference and Learning in PGBM

: variables to be sampled.

: variables that are given.

h2 focus on the task-irrelevant part of the input features that are gated with (complement of) switch variables.

Page 40: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

40 / 61

Kihyuk Sohn

• Conditional probabilities P(z|-)

Inference and Learning in PGBM

: variables to be sampled.

: variables that are given.

Page 41: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

41 / 61

Kihyuk Sohn

• Conditional probabilities P(z|-)

Inference and Learning in PGBM

: variables to be sampled.

: variables that are given.

The switch variable is determined through the competition between h1 and h2 based on the matching between visible variable and the contribution (reconstruction) from each group of hidden units.

Page 42: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

42 / 61

Kihyuk Sohn

• Conditional probabilities P(v|-)

Inference and Learning in PGBM

: variables to be sampled.

: variables that are given.

The visible variable is determined with both groups of hidden units.

Page 43: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

43 / 61

Kihyuk Sohn

• Conditional probabilities P(y|-)

Inference and Learning in PGBM

: variables to be sampled.

: variables that are given.

The label variable is inferred only with h1, not h2.

Page 44: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

44 / 61

Kihyuk Sohn

• Inference

– Mean-field or alternate Gibbs sampling for approximate inference.

– Conditional independence of single type of variables given other variables.

• Training

– Maximum-likelihood for joint distribution .

– Stochastic gradient descent using contrastive divergence.

Inference and Learning in PGBM

Page 45: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

45 / 61

Kihyuk Sohn

• Propagate only “task-relevant” information to higher layers.

• Stack multiple layers of neural networks on top of task-relevant group of hidden units.

Extensions – deeper architecture

2nd hidden layer …

3rd hidden layer …

label layer

Page 46: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

46 / 61

Kihyuk Sohn

• Convolutional Point-wise Gated Deep Network (CPGDN)

– Convolutional architecture is good at dealing with spatially (or temporally) correlated data.

– Convolutional deep belief network (CDBN; Lee et al., ICML 2009, Desjardins and Bengio, 2008)

Extensions – convolutional PGBM

Page 47: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

47 / 61

Kihyuk Sohn

– Low-level features are generic patterns (e.g., edges)

– High-level features are semantically meaningful.

Extensions – convolutional PGBM

W1

W2

[CDBN, Lee et al., ICML 2009]

input image

Page 48: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

48 / 61

Kihyuk Sohn

– Low-level features are generic patterns (e.g., edges)

– High-level features are semantically meaningful.

Extensions – convolutional PGBM

W1

W2

[CDBN, Lee et al., ICML 2009]

input image

Page 49: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

49 / 61

Kihyuk Sohn

– Low-level features are generic patterns (e.g., edges)

– High-level features are semantically meaningful.

Extensions – convolutional PGBM

W1

W2

[CDBN, Lee et al., ICML 2009]

input image

There are some irrelevant patterns as well.

Page 50: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

50 / 61

Kihyuk Sohn

– We can distinguish between task-relevant and irrelevant features with point-wise gating idea while feature learning.

Extensions – convolutional PGBM

input image

W1

W2

W1 W2

Page 51: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

51 / 61

Kihyuk Sohn

– We can distinguish between task-relevant and irrelevant features with point-wise gating idea while feature learning.

Extensions – convolutional PGBM

input image

W1

W2

W1 W2

Page 52: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

52 / 61

Kihyuk Sohn

• Task 1: handwritten digit recognition in the presence of background noise.

• Task 2: learning from large images with cluttered background in application to weakly supervised object localization and object recognition.

Experiments

Page 53: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

53 / 61

Kihyuk Sohn

• Recognizing handwritten digits in the presence of background noise.

– uniform random noise or natural images in the background.

– rotation transformations are applied.

– Due to significant amount of distracting factors, learning good features become much more challenging, and this results in poor recognition performance.

Experiments – variations of MNIST with background noise

original back-rand back-image rot-back-rand rot-back-image

Page 54: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

54 / 61

Kihyuk Sohn

• Learning from noisy handwritten digits with PGBM

Experiments – visualizations

Learned task-relevant hidden unit weights: mostly pen-strokes

Inferred switch variables

Noisy digit images (mnist-back-image)

Learned task-irrelevant hidden unit weights:

noisy patterns

Page 55: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

55 / 61

Kihyuk Sohn

• Handwritten digit recognition

– Evaluated on variations of MNIST.

– Compared with several variations of RBMs:

• Standard RBM.

• Implicit mixture of RBM (imRBM; Nair and Hinton, NIPS 2008) – multiple groups of hidden units.

• Discriminative RBM (discRBM; Larochelle and Bengio, ICML 2008) – supervised, semi-supervised training.

• Standard RBM + feature selection.

Experiments – digit recognition

Page 56: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

56 / 61

Kihyuk Sohn

• Handwritten digit recognition error rates

Experiments – digit recognition

0

10

20

30

40

50

60

RBM

imRBM

discRBM

RBM-FS

PGBM

supervised PGBM

back-rand back-image rot-back-image rot-back-rand

Page 57: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

57 / 61

Kihyuk Sohn

• Handwritten digit recognition error rates

Experiments – digit recognition

0

10

20

30

40

50

60

RBM

imRBM

discRBM

RBM-FS

PGBM

supervised PGBM

back-rand back-image rot-back-image rot-back-rand

Page 58: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

58 / 61

Kihyuk Sohn

• Handwritten digit recognition error rates

Experiments – digit recognition

0

10

20

30

40

50

60

RBM

imRBM

discRBM

RBM-FS

PGBM

supervised PGBM

back-rand back-image rot-back-image rot-back-rand

Feature selection doesn’t improve the performance.

Page 59: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

59 / 61

Kihyuk Sohn

• Handwritten digit recognition error rates

0

10

20

30

40

50

60

RBM

imRBM

discRBM

RBM-FS

PGBM

supervised PGBM

back-rand back-image rot-back-image rot-back-rand

Experiments – digit recognition

Joint feature learning and feature selection improves performance.

Page 60: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

60 / 61

Kihyuk Sohn

• Handwritten digit recognition error rates

0

10

20

30

40

50

60

RBM

imRBM

discRBM

RBM-FS

PGBM

supervised PGBM

back-rand back-image rot-back-image rot-back-rand

Experiments – digit recognition

2~8% decrease in absolute error rates.

Page 61: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

61 / 61

Kihyuk Sohn

• Comparison to other deep learning methods

Experiments – digit recognition

0

10

20

30

40

50

60 RBM

DBN-3 (Vincent et al., 2008)

CAE-2 (Rifai et al., 2011)

PGBM

supervised PGBM

PGBM + DN-1

back-rand back-image rot-back-image

Page 62: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

62 / 61

Kihyuk Sohn

• Comparison to other deep learning methods

Experiments – digit recognition

0

10

20

30

40

50

60 RBM

DBN-3 (Vincent et al., 2008)

CAE-2 (Rifai et al., 2011)

PGBM

supervised PGBM

PGBM + DN-1

back-rand back-image rot-back-image

Page 63: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

63 / 61

Kihyuk Sohn

• Comparison to other deep learning methods

Experiments – digit recognition

0

10

20

30

40

50

60 RBM

DBN-3 (Vincent et al., 2008)

CAE-2 (Rifai et al., 2011)

PGBM

supervised PGBM

PGBM + DN-1

back-rand back-image rot-back-image

Page 64: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

64 / 61

Kihyuk Sohn

• Comparison to other deep learning methods

Experiments – digit recognition

0

10

20

30

40

50

60 RBM

DBN-3 (Vincent et al., 2008)

CAE-2 (Rifai et al., 2011)

PGBM

supervised PGBM

PGBM + DN-1

back-rand back-image rot-back-image

3~13% decrease in absolute error rates, achieving state-of-the-art.

Page 65: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

65 / 61

Kihyuk Sohn

• Given cluttered, high-resolution images, how can we find relevant foreground features?

– Weakly supervised setting (no bounding box is given).

• Convolutional point-wise gating for generative feature selection while feature learning from large images.

Experiments – learning from large images with cluttered background

Page 66: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

66 / 61

Kihyuk Sohn

• CPGDN on Caltech 101 dataset

Experiments – CPGDN

input image

W1

W2

W1 W2

Page 67: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

67 / 61

Kihyuk Sohn

• Learned set of filters (task-relevant/irrelevant)

• (Weakly supervised) object localization

Experiments – weakly supervised object segmentation

Caltech101 - Faces Caltech101 – car side

1st row: switch unit activation map, 2nd row: predicted and ground truth bounding box.

Page 68: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

68 / 61

Kihyuk Sohn

• Learned set of bases from 101 classes

Experiments – weakly supervised object segmentation

Caltech101 – task-relevant

Caltech101 – task-irrelevant

Page 69: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

69 / 61

Kihyuk Sohn

Experiments – weakly supervised object segmentation

1st row: switch unit activation map, 2nd row: predicted and ground truth bounding box.

Page 70: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

70 / 61

Kihyuk Sohn

• Object recognition with predicted bounding boxes.

Experiments – object recognition

1. Bounding box prediction using CPGDN.

2. Dense SIFT feature extraction from cropped image.

3. Feature encoding with Gaussian RBM or CRBM (Sohn et al., ICCV 2011).

4. Spatial pyramid pooling, followed by linear SVM.

Page 71: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

71 / 61

Kihyuk Sohn

• Classification accuracy on Caltech 101 dataset

Experiments – object recognition

50

55

60

65

70

75

80

15 images/class 30 images/class

RBM (Sohn et al., 2011)

CPGDN + RBM

CRBM (Sohn et al., 2011)

CPGDN + CRBM

Page 72: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

72 / 61

Kihyuk Sohn

• Classification accuracy on Caltech 101 dataset

Experiments – object recognition

50

55

60

65

70

75

80

15 images/class 30 images/class

RBM (Sohn et al., 2011)

CPGDN + RBM

CRBM (Sohn et al., 2011)

CPGDN + CRBM

Page 73: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

73 / 61

Kihyuk Sohn

• Comparison to other results on Caltech 101 dataset

– With single type of features.

– MKL (Yang et al., 2009): 84.3%

– Multipath sparse coding (Bo et al., 2013): 82.5%

– ….

Experiments – object recognition

Page 74: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

74 / 61

Kihyuk Sohn

• We propose the PGBMs that jointly perform the feature learning and feature selection in a unified framework.

• The PGBM effectively learn useful representations from the data containing significant irrelevant or distracting patterns.

Conclusion

Page 75: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann

75 / 61

Kihyuk Sohn

Thank you

demo code is available: http://umich.edu/~kihyuks/pubs/pgbm.zip