13
The Role of Learning in Vision 3.30pm: Rob Fergus 3.40pm: Andrew Ng 3.50pm: Kai Yu 4.00pm: Yann LeCun 4.10pm: Alan Yuille 4.20pm: Deva Ramanan 4.30pm: Erik Learned-Miller 4.40pm: Erik Sudderth 4.50pm: Spotlights - Qiang Ji, M-H Yang 4.55pm: Discussion 5.30pm: End Feature / Deep Learnin Compositional Models Learning Representatio Overview Low-level Representatio Learning on the fly

Fcv learn fergus

  • Upload
    zukun

  • View
    349

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fcv learn fergus

The Role of Learning in Vision

3.30pm: Rob Fergus3.40pm: Andrew Ng3.50pm: Kai Yu4.00pm: Yann LeCun4.10pm: Alan Yuille4.20pm: Deva Ramanan4.30pm: Erik Learned-Miller4.40pm: Erik Sudderth4.50pm: Spotlights

- Qiang Ji, M-H Yang4.55pm: Discussion5.30pm: End

Feature / Deep Learning

Compositional Models

Learning Representations

Overview

Low-level Representations

Learning on the fly

Page 2: Fcv learn fergus

An Overview of Hierarchical Feature Learning and Relations to Other Models

Rob Fergus

Dept. of Computer Science, Courant Institute,

New York University

Page 3: Fcv learn fergus

Motivation

• Multitude of hand-designed features currently in use– SIFT, HOG, LBP, MSER, Color-SIFT………….

• Maybe some way of learning the features?

• Also, just capture low-level edge gradients

Felzenszwalb, Girshick, McAllester and Ramanan, PAMI

2007

Yan & Huang (Winner of PASCAL 2010 classification

competition)

Page 4: Fcv learn fergus

• Mid-level cues

Beyond Edges?

“Tokens” from Vision by D.Marr:

Continuation Parallelism Junctions Corners

• High-level object parts:

• Difficult to hand-engineer What about learning them?

Page 5: Fcv learn fergus

• Build hierarchy of feature extractors (≥ 1 layers)– All the way from pixels classifier– Homogenous structure per layer– Unsupervised training

Deep/Feature Learning Goal

Layer 1Layer 1 Layer 2Layer 2 Layer 3Layer 3 Simple Classifier

Image/VideoPixels

• Numerous approaches:– Restricted Boltzmann Machines (Hinton, Ng, Bengio,…)– Sparse coding (Yu,

Fergus, LeCun)– Auto-encoders (LeCun,

Bengio)– ICA variants (Ng, Cottrell)

& many more….

Page 6: Fcv learn fergus

Single Layer Architecture

Filter

Normalize

Pool

Input: Image Pixels / Features

Output: Features / Classifier

Details in the boxes matter

(especially in a hierarchy)

Links to neuroscience

Page 7: Fcv learn fergus

Example Feature Learning Architectures

Pixels /Features

Filter with Dictionary(patch/tiled/convolutional)

Spatial/Feature (Sum or Max)

Normalizationbetween feature responses

Features

+ Non-linearity

Local Contrast Normalization (Subtractive /

Divisive)

(Group)

Sparsity

Max /

Softmax

Page 8: Fcv learn fergus

SIFT Descriptor

Image Pixels Apply

Gabor filters

Spatial pool (Sum)

Normalize to unit length

Feature Vector

Page 9: Fcv learn fergus

SIFTFeatures

Filter with Visual Words

Multi-scalespatial pool (Sum)

Max

Classifier

Spatial Pyramid Matching

Lazebnik, Schmid,

Ponce [CVPR 2006]

Page 10: Fcv learn fergus

Role of Normalization

• Lots of different mechanisms (max, sparsity, LCN etc.)

• All induce local competition between features to explain input– “Explaining away” – Just like top-down models– But more local mechanism

Example: Convolutional Sparse Coding

FiltersConvolution

|.|1|.|1|.|1|.|1

Zeiler et al. [CVPR’10/ICCV’11],Kavakouglou et al. [NIPS’10], Yang et al. [CVPR’10]

Page 11: Fcv learn fergus

Role of Pooling

• Spatial pooling– Invariance to small

transformations

Chen, Zhu, Lin, Yuille, Zhang [NIPS 2007]

• Pooling across feature groups– Gives AND/OR type behavior– Compositional models of Zhu,

Yuille

– Larger receptive fields

Zeiler, Taylor, Fergus [ICCV 2011]

• Pooling with latent variables (& springs)– Pictorial structures models

Felzenszwalb, Girshick, McAllester, Ramanan[PAMI 2009]

Page 12: Fcv learn fergus
Page 13: Fcv learn fergus

HOGPyramid

Apply objectpart filters

Pool part responses (latent variables & springs) Non-maxSuppression(Spatial)

Score

Object Detection with Discriminatively Trained Part-Based Models

Felzenszwalb, Girshick,

McAllester, Ramanan

[PAMI 2009]

+ +