Upload
flory-bratiloveanu
View
235
Download
3
Embed Size (px)
DESCRIPTION
jkl
Citation preview
Yann LeCun
5yearsfromnow,everyonewilllearn
theirfeatures(youmightaswellstartnow)
5yearsfromnow,everyonewilllearn
theirfeatures(youmightaswellstartnow)
YannLeCunCourantInstituteofMathematicalSciences
andCenterforNeuralScience,NewYorkUniversity
YannLeCunCourantInstituteofMathematicalSciences
andCenterforNeuralScience,NewYorkUniversity
Yann LeCun
I Have a Terrible Confession to MakeI Have a Terrible Confession to Make
I'm interested in vision, but no more in vision than in audition or in other perceptual modalities.
I'm interested in perception (and in control).
I'd like to find a learning algorithm and architecture that could work (with minor changes) for many modalities
Nature seems to have found one.
Almost all natural perceptual signals have a local structure (in space and time) similar to images and videos
Heavy correlation between neighboring variablesLocal patches of variables have structure, and are representable by feature vectors.
I like vision because it's challenging, it's useful, it's fun, we have datathe image recognition community is not yet stuck in a deep local minimum like the speech recognition community.
Yann LeCun
The Unity of Recognition
Architectures
The Unity of Recognition
Architectures
Yann LeCun
Most Recognition Systems Are Built on the Same ArchitectureMost Recognition Systems Are Built on the Same Architecture
First stage: dense SIFT, HOG, GIST, sparse coding, RBM, auto-encoders.....
Second stage: K-means, sparse coding, LCC....
Pooling: average, L2, max, max with bias (elastic templates).....
Convolutional Nets: same architecture, but everything is trained.
Filter
Bank
feature
Pooling
Non
LinearityClassifier
Filter
Bank
Non
LinNormPool
Filter
Bank
Non
LinNormPool Classifier
Norma
lization
Yann LeCun
Filter Bank + Non-Linearity + Pooling + NormalizationFilter Bank + Non-Linearity + Pooling + Normalization
This model of a feature extraction stage is biologically-inspired ...whether you like it or not (just ask David Lowe)Inspired by [Hubel and Wiesel 1962]The use of this module goes back to Fukushima's Neocognitron (and even earlier models in the 60's).
FilterBank
SpatialPooling
NonLinearity
Yann LeCun
How well does this work?How well does this work?
Some results on C101 (I know, I know....)SIFT->K-means->Pyramid pooling->SVM intersection kernel: >65%
[Lazebnik et al. CVPR 2006]
SIFT->Sparse coding on Blocks->Pyramid pooling->SVM: >75%[Boureau et al. CVPR 2010] [Yang et al. 2008]
SIFT->Local Sparse coding on Block->Pyramid pooling->SVM: >77%[Boureau et al. ICCV 2011]
(Small) supervised ConvNet with sparsity penalty: >71% [rejected from CVPR,ICCV,etc] REAL TIME
OrientedEdges
WinnerTakesAll
Histogram(sum)
Filter
Bank
feature
Pooling
Non
Linearity
Filter
Bank
feature
Pooling
Non
LinearityClassifier
SIFT
KmeansOrSparseCoding
PyramidHistogram.ElasticpartsModels,...
SVMorAnotherSimpleclassifier
Yann LeCun
Convolutional Networks (ConvNets) fits that modelConvolutional Networks (ConvNets) fits that model
Yann LeCun
Why do two stages work better than one stage?Why do two stages work better than one stage?
The second stage extracts mid-level features
Having multiple stages helps the selectivity-invariance dilemma
Filter
Bank
Non
LinNormPool
Filter
Bank
Non
LinNormPool Classifier
Yann LeCun
Learning Hierarchical RepresentationsLearning Hierarchical Representations
I agree with David Lowe: we should learn the features
It worked for speech, handwriting, NLP.....
In a way, the vision community has been running a ridiculously inefficient evolutionary learning algorithm to learn features:
Mutation: tweak existing features in many different waysSelection: Publish the best ones at CVPRReproduction: combine several features from the last CVPRIterate. Problem: Moore's law works against you
TrainableFeature
Transform
TrainableFeature
Transform
TrainableClassifier
LearnedInternalRepresentation
Yann LeCun
Sometimes, Biology gives you
good hints example:
contrast normalization
Sometimes, Biology gives you
good hints example:
contrast normalization
Yann LeCunTHISISONESTAGEOFTHECONVNET
SoftThresholding+AbsNSubtractiveandDivisiveLocalNormalizationPPoolingdownsamplinglayer:averageormax?
CConvolutions(filterbank)Harsh Non-Linearity + Contrast Normalization + SparsityHarsh Non-Linearity + Contrast Normalization + Sparsity
subtr activ e+di visive contr astn orm
a lizat ion
Con vol utio ns
Thr esho ldin g
Rec tific atio n
Pooli ng,s ubsa mpli ng
Yann LeCun
Soft Thresholding Non-LinearitySoft Thresholding Non-Linearity
Yann LeCun
Local Contrast NormalizationLocal Contrast Normalization
Performed on the state of every layer, including the input
Subtractive Local Contrast NormalizationSubtracts from every value in a feature a Gaussian-weighted average of its neighbors (high-pass filter)
Divisive Local Contrast NormalizationDivides every value in a layer by the standard deviation of its neighbors over space and over all feature maps
Subtractive + Divisive LCN performs a kind of approximate whitening.
Yann LeCun
C101 Performance (I know, I know)C101 Performance (I know, I know)
Small network: 64 features at stage-1, 256 features at stage-2:
Tanh non-linearity, No Rectification, No normalization: 29%
Tanh non-linearity, Rectification, normalization: 65%
Shrink non-linearity, Rectification, norm, sparsity penalty 71%
Yann LeCun
Results on Caltech101 with sigmoid non-linearityResults on Caltech101 with sigmoid non-linearity
likeHMAXmodel
Yann LeCun
Feature Learning Works Really Well on everything but C101
Feature Learning Works Really Well on everything but C101
Yann LeCun
C101 is very unfavorable to learning-based systemsC101 is very unfavorable to learning-based systems
Because it's so small. We are switching to ImageNet
Some results on NORBNonormalization
Randomfilters
Nonormalization
Unsupfilters
Unsup+SupfiltersSupfilters
Yann LeCun
Sparse Auto-EncodersSparse Auto-Encoders
Inference by gradient descent starting from the encoder output
Z i=argminzE Yi , z ;W
INPUT Y Z
Y i Y2
z j
W d Z
FEATURES
j .
Z Z2ge W e ,Yi
E Y i , Z =Y iW d Z2Zge W e ,Y
i2 j z j
Yann LeCun
Using PSD to Train a Hierarchy of FeaturesUsing PSD to Train a Hierarchy of Features
Phase 1: train first layer using PSD
FEATURES
Y Z
Y i Y2
z j
W d Z j .
Z Z2ge W e ,Yi
Yann LeCun
Using PSD to Train a Hierarchy of FeaturesUsing PSD to Train a Hierarchy of Features
Phase 1: train first layer using PSD
Phase 2: use encoder + absolute value as feature extractor
FEATURES
Y z j
ge W e ,Yi
Yann LeCun
Using PSD to Train a Hierarchy of FeaturesUsing PSD to Train a Hierarchy of Features
Phase 1: train first layer using PSD
Phase 2: use encoder + absolute value as feature extractor
Phase 3: train the second layer using PSD
FEATURES
Y z j
ge W e ,Yi
Y Z
Y i Y2
z j
W d Z j .
Z Z2ge W e ,Yi
Yann LeCun
Using PSD to Train a Hierarchy of FeaturesUsing PSD to Train a Hierarchy of Features
Phase 1: train first layer using PSD
Phase 2: use encoder + absolute value as feature extractor
Phase 3: train the second layer using PSD
Phase 4: use encoder + absolute value as 2nd feature extractor
FEATURES
Y z j
ge W e ,Yi
z j
ge W e ,Yi
Yann LeCun
Using PSD to Train a Hierarchy of FeaturesUsing PSD to Train a Hierarchy of Features
Phase 1: train first layer using PSD
Phase 2: use encoder + absolute value as feature extractor
Phase 3: train the second layer using PSD
Phase 4: use encoder + absolute value as 2nd feature extractor
Phase 5: train a supervised classifier on top
Phase 6 (optional): train the entire system with supervised back-propagation
FEATURES
Y z j
ge W e ,Yi
z j
ge W e ,Yi
classifier
Yann LeCun
Learned Features on natural patches: V1-like receptive fieldsLearned Features on natural patches: V1-like receptive fields
Yann LeCun
Using PSD Features for Object RecognitionUsing PSD Features for Object Recognition
64 filters on 9x9 patches trained with PSD with Linear-Sigmoid-Diagonal Encoder
Yann LeCun
ConvolutionalSparseCodingConvolutionalSparseCoding
[Kavukcuogluetal.NIPS2010]:convolutionalPSD
[Zeiler,Krishnan,Taylor,Fergus,CVPR2010]:DeconvolutionalNetwork[Lee,Gross,Ranganath,Ng,ICML2009]:ConvolutionalBoltzmannMachine[Norouzi,Ranjbar,Mori,CVPR2009]:ConvolutionalBoltzmannMachine[Chen,Sapiro,Dunson,Carin,Preprint2010]:DeconvolutionalNetworkwithautomaticadjustmentofcodedimension.
Yann LeCun
Convolutional TrainingConvolutional Training
Problem: With patch-level training, the learning algorithm must reconstruct the entire patch with a single feature vectorBut when the filters are used convolutionally, neighboring feature vectors will be highly redundant
Patchleveltrainingproduceslotsoffiltersthatareshiftedversionsofeachother.
Yann LeCun
Convolutional Sparse CodingConvolutional Sparse Coding
Replace the dot products with dictionary element by convolutions.Input Y is a full imageEach code component Zk is a feature map (an image)Each dictionary element is a convolution kernel
Regular sparse coding
Convolutional S.C.
k. * ZkWk
Y =
deconvolutional networks [Zeiler, Taylor, Fergus CVPR 2010]
Yann LeCun
Convolutional PSD: Encoder with a soft sh() Function Convolutional PSD: Encoder with a soft sh() Function
Convolutional FormulationExtend sparse coding from PATCH to IMAGE
PATCH based learning CONVOLUTIONAL learning
Yann LeCun
Cifar-10 Dataset Cifar-10 Dataset
Dataset of tiny imagesImages are 32x32 color images10 object categories with 50000 training and 10000 testing
Example Images
Yann LeCun
Comparative Results on Cifar-10 DatasetComparative Results on Cifar-10 Dataset
* Krizhevsky. Learning multiple layers of features from tiny images. Masters thesis, Dept of CS U of Toronto
**Ranzato and Hinton. Modeling pixel means and covariances using a factorized third order boltzmann machine. CVPR 2010
Yann LeCun
Road Sign Recognition CompetitionRoad Sign Recognition Competition
GTSRB Road Sign Recognition Competition (phase 1)32x32 imagesThe 13 of the top 14 entries are ConvNets, 6 from NYU, 7 from IDSIANo 6 is humans!
Yann LeCun
Pedestrian Detection (INRIA Dataset)Pedestrian Detection (INRIA Dataset)
[Sermanetetal.,RejectedfromICCV2011]]
Yann LeCun
Pedestrian Detection: ExamplesPedestrian Detection: Examples
[Kavukcuogluetal.NIPS2010]
Yann LeCun
LearningInvariantFeatures
LearningInvariantFeatures
Yann LeCun
Why just pool over space? Why not over orientation?Why just pool over space? Why not over orientation?
Using an idea from Hyvarinen: topographic square pooling (subspace ICA)1. Apply filters on a patch (with suitable non-linearity)2. Arrange filter outputs on a 2D plane3. square filter outputs4. minimize sqrt of sum of blocks of squared filter outputs
Yann LeCun
Why just pool over space? Why not over orientation?Why just pool over space? Why not over orientation?
The filters arrange themselves spontaneously so that similar filters enter the same pool.
The pooling units can be seen as complex cells
They are invariant to local transformations of the inputFor some it's translations, for others rotations, or other transformations.
Yann LeCun
Pinwheels?Pinwheels?
Does that look pinwheely to you?
Yann LeCun
Sparsity throughLateral InhibitionSparsity throughLateral Inhibition
Yann LeCun
Invariant Features Lateral InhibitionInvariant Features Lateral Inhibition
Replace the L1 sparsity term by a lateral inhibition matrix
Yann LeCun
Invariant Features Lateral InhibitionInvariant Features Lateral Inhibition
Zeros I S matrix have tree structure
Yann LeCun
Invariant Features Lateral InhibitionInvariant Features Lateral Inhibition
Non-zero values in S form a ring in a 2D topologyInput patches are high-pass filtered
Yann LeCun
Invariant Features Lateral InhibitionInvariant Features Lateral Inhibition
Non-zero values in S form a ring in a 2D topologyLeft: non high-pass filtering of inputRight: patch-level mean removal
Yann LeCun
Invariant Features Short-Range Lateral Excitation + L1Invariant Features Short-Range Lateral Excitation + L1
l
Yann LeCun
Disentangling the Explanatory Factors
of Images
Disentangling the Explanatory Factors
of Images
Yann LeCun
Separating Separating
I used to think that recognition was all about eliminating irrelevant information while keeping the useful one
Building invariant representationsEliminating irrelevant variabilities
I now think that recognition is all about disentangling independent factors of variations:
Separating what and whereSeparating content from instantiation parametersHinton's capsules; Karol Gregor's what-where auto-encoders
Yann LeCun
Invariant Features through Temporal Constancy Invariant Features through Temporal Constancy
Object is cross-product of object type and instantiation parameters[Hinton 1981]
small medium large
Objecttype Objectsize[KarolGregoretal.]
Yann LeCun
Invariant Features through Temporal Constancy Invariant Features through Temporal Constancy
St St1 St2
C1t C
1t1 C
1t2 C
2t
Decoder
W1 W1 W1 W2
Predictedinput
C1t C
1t1 C
1t2 C
2t
St St1 St2
Inferredcode
Predictedcode
InputEncoder
f W 1 f W 1 f W 1
W 2
fW 2W 2
Yann LeCun
Invariant Features through Temporal Constancy Invariant Features through Temporal Constancy
C1(where)
C2(what)
Yann LeCun
Input
Generating from the NetworkGenerating from the Network
Yann LeCun
What is the right criterion to train
hierarchical feature extraction
architectures?
What is the right criterion to train
hierarchical feature extraction
architectures?
Yann LeCun
Flattening the Data Manifold?Flattening the Data Manifold?
The manifold of all images of is low-dimensional and highly curvy
Feature extractors should flatten the manifold
Yann LeCun
Flattening the
Data Manifold?
Flattening the
Data Manifold?
Yann LeCun
The Ultimate Recognition SystemThe Ultimate Recognition System
Bottom-up and top-down informationTop-down: complex inference and disambiguationBottom-up: learns to quickly predict the result of the top-down inference
Integrated supervised and unsupervised learningCapture the dependencies between all observed variables
CompositionalityEach stage has latent instantiation variables
TrainableFeature
Transform
TrainableFeature
Transform
TrainableClassifier
LearnedInternalRepresentation
Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23Slide 24Slide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Slide 37Slide 38Slide 39Slide 40Slide 41Slide 42Slide 43Slide 44Slide 45Slide 46Slide 47Slide 48Slide 49Slide 50Slide 51Slide 52Slide 53Slide 54