Upload
zukun
View
349
Download
0
Embed Size (px)
Citation preview
The Role of Learning in Vision
3.30pm: Rob Fergus3.40pm: Andrew Ng3.50pm: Kai Yu4.00pm: Yann LeCun4.10pm: Alan Yuille4.20pm: Deva Ramanan4.30pm: Erik Learned-Miller4.40pm: Erik Sudderth4.50pm: Spotlights
- Qiang Ji, M-H Yang4.55pm: Discussion5.30pm: End
Feature / Deep Learning
Compositional Models
Learning Representations
Overview
Low-level Representations
Learning on the fly
An Overview of Hierarchical Feature Learning and Relations to Other Models
Rob Fergus
Dept. of Computer Science, Courant Institute,
New York University
Motivation
• Multitude of hand-designed features currently in use– SIFT, HOG, LBP, MSER, Color-SIFT………….
• Maybe some way of learning the features?
• Also, just capture low-level edge gradients
Felzenszwalb, Girshick, McAllester and Ramanan, PAMI
2007
Yan & Huang (Winner of PASCAL 2010 classification
competition)
• Mid-level cues
Beyond Edges?
“Tokens” from Vision by D.Marr:
Continuation Parallelism Junctions Corners
• High-level object parts:
• Difficult to hand-engineer What about learning them?
• Build hierarchy of feature extractors (≥ 1 layers)– All the way from pixels classifier– Homogenous structure per layer– Unsupervised training
Deep/Feature Learning Goal
Layer 1Layer 1 Layer 2Layer 2 Layer 3Layer 3 Simple Classifier
Image/VideoPixels
• Numerous approaches:– Restricted Boltzmann Machines (Hinton, Ng, Bengio,…)– Sparse coding (Yu,
Fergus, LeCun)– Auto-encoders (LeCun,
Bengio)– ICA variants (Ng, Cottrell)
& many more….
Single Layer Architecture
Filter
Normalize
Pool
Input: Image Pixels / Features
Output: Features / Classifier
Details in the boxes matter
(especially in a hierarchy)
Links to neuroscience
Example Feature Learning Architectures
Pixels /Features
Filter with Dictionary(patch/tiled/convolutional)
Spatial/Feature (Sum or Max)
Normalizationbetween feature responses
Features
+ Non-linearity
Local Contrast Normalization (Subtractive /
Divisive)
(Group)
Sparsity
Max /
Softmax
SIFT Descriptor
Image Pixels Apply
Gabor filters
Spatial pool (Sum)
Normalize to unit length
Feature Vector
SIFTFeatures
Filter with Visual Words
Multi-scalespatial pool (Sum)
Max
Classifier
Spatial Pyramid Matching
Lazebnik, Schmid,
Ponce [CVPR 2006]
Role of Normalization
• Lots of different mechanisms (max, sparsity, LCN etc.)
• All induce local competition between features to explain input– “Explaining away” – Just like top-down models– But more local mechanism
Example: Convolutional Sparse Coding
FiltersConvolution
|.|1|.|1|.|1|.|1
Zeiler et al. [CVPR’10/ICCV’11],Kavakouglou et al. [NIPS’10], Yang et al. [CVPR’10]
Role of Pooling
• Spatial pooling– Invariance to small
transformations
Chen, Zhu, Lin, Yuille, Zhang [NIPS 2007]
• Pooling across feature groups– Gives AND/OR type behavior– Compositional models of Zhu,
Yuille
– Larger receptive fields
Zeiler, Taylor, Fergus [ICCV 2011]
• Pooling with latent variables (& springs)– Pictorial structures models
Felzenszwalb, Girshick, McAllester, Ramanan[PAMI 2009]
HOGPyramid
Apply objectpart filters
Pool part responses (latent variables & springs) Non-maxSuppression(Spatial)
Score
Object Detection with Discriminatively Trained Part-Based Models
Felzenszwalb, Girshick,
McAllester, Ramanan
[PAMI 2009]
+ +