Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia...

Learning Features and Parts for Fine-Grained Recognition

Authors: Jonathan Krause, Timnit Gebru, Jia Deng , Li-Jia Li, Li Fei-Fei

ICPR, 2014

Presented by: Paritosh

2Problem addressed Authors address the problem of Fine-Grained Recognition

Fine-Grained Recognition – distinguish at subordinate level – intraclass classification

Examples: Bird species, car models, cat breeds, etc. what specie that bird belongs to? Is that car Mclaren P1 or Mclaren

MP4-12C ?

(All the images are taken from the paper being presented)

How do we do Fine-Grained (FG) recognition?

By noticing distinguishing parts – parts which are different

In the previous example grills and lights are some parts that help in distinguishing, visually

4Why not use popular descriptors ?

Popular descriptors: SIFT HOG

They are not expressive enough

Doesn’t, reliably, capture intricacies Almost similarHOG patterns

Go for very expressive framework

Convolutional Neural Networks or ConvNets or CNN’s are the State-of-the-Art

Very rich descriptions

Highly scalable

Consistently, best performance on the ImageNet

Deep Learning

Deep learning is a kind of hierarchical learning paradigm

For example, a neural network with 5 hidden layers; shallow (typical) neural nets are the ones with 1 or 2 hidden layers

Layers and their learned representation: Input layer – raw data

Hidden layer 1 – learns very low level features from raw data - edges can be a good example

Hidden layer 2 – learns features that are of higher level than Hidden layer 1 – may be learns to identify squares

Level of features increase in a similar fashion as we go on to subsequent hidden layers

Deep Learning – CNN’s

CNN’s are deep learners

Combination of Input layer Convolutional layers Activation layers (ReLU layers) Sub-sampling/pooling layers Fully connected layers

Contains several of these layers

CNN’s

9back to the paper…The Big picture

Find discriminative parts

Describe these parts using rich descriptions, like the ones obtained from the CNN’s

Rich descriptions should be able to capture the subtle differences

Train a multiclass classifier

Dataset, ‘Cars’ - some images

11Dataset (annotations - BB’s, labels)

The pipeline for FG recognition

Training:

Step 1: Detect parts using trained part detectors

Step 2: determine the appearance of these parts using CNN’s features. Concatenate these features. Concatenated features form what is known as ELLF (Ensemble of Localized Learned Features)

Step 3: Train Linear SVM’s for classification

Step 1: Part discovery & detection For the purpose of scalability, go for unsupervised learning

Remove the human intervention (in the form of annotations) from the loop

Observation made by the authors: Objects with same pose can be discovered using local low-level cues

Assumptions: Images within a set are well aligned Implies that same parts will have similar locations – for example, given a

small set of car images, probability of finding wheels, grills, etc, in similar positions, is very high

Part discovery contd…

Discovering aligned images

Step 1: Randomly pick a seed image

Step 2: Perform Grabcut – removes background efficiently – we are left with just car, no background

Step 3: Find HOG features for the car images

Step 4: find nearest neighbors, based on HOG features

Repeat the above process multiple times

Results in set of images with more or less same pose

Part selection

Step 1: randomly sample large no. of regions of various sizes

Step 2: select parts which high variance in their HOG features

Step 3: keep the parts which have high variance in HOG features (these are the ones useful for discrimination)

Step 4: get rid of redundant parts

At last, they have 10 parts, started with 5000

16Detector learning Equivalently, find template that minimize hinge loss (a loss function) – difference between

true label and the outcome of the classifier

Essentially, this is telling the detector, what is positive and what is negative, based on HOG features

We start this learning under the assumption that images are well aligned

Once trained, we retrain, only this time we relax the assumption that images are well aligned

We do this by allowing the usage of true positive patch location – because, remember, we assumed that the parts will be occupying similar positions and not the same

If true positions are too far away, they are penalized

In effect, this is Gaussian distribution about the original location (the one under the well alignment assumption)

Helps in neglecting parts with appearance similar to positive patches

Ensemble of parts

Remember that there are multiple parts as an outcome part discovery

Learn detector for every part

Consequently, we would have a collection pf part detector, each detecting a specific – in cars, wheels, grills, etc.

Authors call this Ensemble of parts.

19Feature Learning Take AlexNet, CNN implementation of Alex Krizhevsky et al.

Tweak (for details refer the paper) to account for smaller dataset

Train the modified CNN – inputs are whole images from the dataset

Retain only the first two convolutional layers

Unlike regular CNN’s which do pooling at fixed positions, we do max pooling, of the descriptors, in the regions marked by bounding boxes (BB’s), which are obtained from part detectors

This yields us Ensemble of Local Learned Features (ELLF)

Feed these ELLF’s to a linear SVM – final classifier

Overfitting is taken care of

Experiments

Some of intuitive details were included in the previous slides

For minute details, CNN parameters and their tuning, please refer the paper

One important observation regarding number of parts used in ELLF features (different from part discovery)

100 parts is where ELLF significantly gets ahead of CNN 1000 parts – ELLF almost reaches saturation

Performance is hurt when GrabCut performs poorly

Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia...

Documents

Fei-Fei Li & Justin Johnson & Serena Yeungvision.stanford.edu/teaching/cs231n/slides/2019/cs231n...Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - 30 April 22, 2019 Activation

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 ...cs231n.stanford.edu/slides/2018/cs231n_2018_lecture04.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12,

Progressive Neural Architecture Search...Progressive Neural Architecture Search Chenxi Liu1⋆, Barret Zoph2, Maxim Neumann2, Jonathon Shlens2, Wei Hua2, Li-Jia Li2, Li Fei-Fei2,3,

YUSHAN ZHOU, ZHIWEI LI, JIA LI - cambridge.org · yushan zhou, zhiwei li, jia li School of Geosciences and Info-Physics, Central South University, Changsha 410083, Hunan, China Correspondence:

The effects of local administration of mesenchymal stem …...Zhe Jia†, Fei Li†, Xiaoyu Zeng, Ying Lv and Shaozhen Zhao* Abstract Background: Mesenchymal stem cells (MSCs) have

Scalable Multi-Label Annotation Jia Deng Olga Russakovsky Jonathan Krause, Michael Bernstein Alexander Berg Li Fei-Fei

Li Fei-Fei, Stanford Rob Fergus, NYU Antonio ... - People

LI jia-jing Portfolio

Iterative Visual Reasoning Beyond Convolutions...Iterative Visual Reasoning Beyond Convolutions Xinlei Chen1 Li-Jia Li2 Li Fei-Fei2 Abhinav Gupta1 1Carnegie Mellon University 2Google

Xia Fei Shuang Jia - Mugetsu no Fansub Novels/Long_Live... · Xia Fei Shuang Jia Long Live Summons ... que Yang ouvrait enfin ses yeux, il ne vit plus aucune trace du vieux prêtre

crowdsourcing, benchmarking & other cool thingsimage-net.org/papers/ImageNet_2010.pdf · crowdsourcing, benchmarking & other cool things Fei-Fei Li (publish under L. Fei-Fei) Computer

Fei Jia CS Resume 3.0.1fei-jia.github.io/Fei Jia CS Resume.pdfMicrosoft Word - Fei Jia CS Resume 3.0.1.docx Created Date: 4/9/2015 5:32:17 AM

Fei-Fei Li & Justin Johnson & Serena Yeungcs231n.stanford.edu/slides/2019/cs231n_2019_lecture07.pdf · 2019-04-23 · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 - 4 April

Fei-Fei Li & Justin Johnson & Serena Yeungcs231n.stanford.edu/slides/2019/cs231n_2019_lecture12.pdfFei-Fei Li & Justin Johnson & Serena Yeung Lecture 12 - 2 May 14, 2019 Administrative

Fei-Fei Li & Justin Johnson & Serena Yeung

Photography Group members ： Jia-Qing Yao(49814055) Bo-Jia Chen(49814021) Teacher:Ru-Li Lin

Fostering the Vibrancy of China - LI Jia, ICBC

Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April 14 ...vision.stanford.edu/teaching/cs231n/slides/2020/lecture_3.pdf · Fei-Fei Li, Ranjay Krishna, Danfei Xu Lecture 3 - April

arXiv:1712.00559v2 [cs.CV] 24 Mar 2018 · Progressive Neural Architecture Search Chenxi Liu1;, Barret Zoph2, Maxim Neumann3, Jonathon Shlens2, Wei Hua3, Li-Jia Li 3, Li Fei-Fei;4,

Crowdsourcing Annotations for Visual Object Detectionvision.stanford.edu/pdf/bbox_submission.pdf · Crowdsourcing Annotations for Visual Object Detection Hao Su, Jia Deng, Li Fei-Fei