Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009

Preview:

DESCRIPTION

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng ICML 2009. Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009. Outline. Motivations Contributions - PowerPoint PPT Presentation

Citation preview

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of

Hierarchical Representations

Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng

ICML 2009

Presented by: Mingyuan Zhou

Duke University, ECE

September 18, 2009

Outline

• Motivations

• Contributions

• Backgrounds

• Algorithms

• Experiment results

• Deep Vs Shallow

• Conclusions

Motivations

• To Learn hierarchical models which simultaneously represent multiple levels, e.g., pixel intensities, edges, object parts, objects, and beyond can be represented by layers from low to high.

• Combining top-down and bottom-up processing of an image.

• Limitations of deep belief networks (DBNs)

• Scaling DBNs to realistic-size images remains challenging: images are high-dimentional and objects can appear at arbitrary locations in images.

Contributions

• Convolutional RBM: feature detectors are shared among all locations in an image.

• Probabilistic max-pooling: in a probabilistic sound way allowing higher-layer units to cover larger areas of the input.

• The first translation invariant hierarchical generative model supporting both top-down and bottom-up probabilistic inference and sales to realistic image sizes.

Backgrounds: Restricted Boltzmann Machine (RBM)

(binary v)

(real-value v)

• Giving the visible layer, the hidden units are conditionally independent, and vise versa.

• Efficient block Gibbs sampling can be performed by alternately sampling each layer’s units.

• Computing the exact gradient of the log-likelihood is intractable, so the contrastive divergence approximation is commonly used.

Backgrounds: Deep belief network (DBN)

• In a DBN, two adjacent layers have a full set of connections between them, but no two units in the same layer are connected.

• A DBN can be formed by stacking RBMs.

• An efficient algorithm for training DBNs (Hinton et al., 2006): greedily training each layer, from lowest to highest, as an RBM using the previous layer's activations as inputs.

Algorithms: Convolutional RBM (CRBM)

Algorithms: Probabilistic max-pooling

Algorithms: Probabilistic max-pooling

• Each unit in a pooling layer computes the maximum activation of the units in a small region of the detection layer.

• Shrinking the representation with max-pooling allows higher-layer representations to be invariant to small translations of the input and reduces the computational burden.

• Max-pooling was intended only for feed-forward architectures. A generative model of images which supports both top-down and bottom-up inference is of interest.

Algorithms: Sparsity regulations

• Only a tiny fraction of the units should be active in relation to a given stimulus.

• Regularizing the objective function to encourage each of the hidden units to have a mean activation close to some small constant .

Algorithms: Convolutional DBN (CDBN)

• CDBN consists of several max-pooling-CRBMs stacked on top of one another.

• Once a given layer is trained, its weights are frozen, and its activations are used as input to the next layer.

Hierarchical probabilistic inference

Experimental Results: natural images

Experimental Results: image classification

Experimental Results: unsupervised learning of object parts

Experimental Results: Hierarchical probabilistic inference

Deep Vs Shallow

From Jason Weston’s slides: DEEP LEARNING VIA SEMI-SUPERVISED EMBEDDING, ICML 2009 WORKSHOP ON LEARNING FEATURE HIERARCHIES

.

From Francis Bach’s slides: Convex sparse methods for feature hierarchies, ICML 2009 WORKSHOP ON LEARNING FEATURE HIERARCHIES

Conclusions

Convolutional deep belief network:

• A scalable generative model for learning hierarchical representations from unlabeled images.

• Performing well in a variety of visual recognition tasks.

Recommended