Fine tuning a convolutional network for cultural event recognition

FINE-TUNING A CONVOLUTIONAL NETWORK FOR CULTURAL EVENT RECOGNITION

ADVISORS:

Andrea Calafell

Xavier Giró-i-Nieto Amaia Salvador

20/07/2015

AUTHOR:

Matthias Zeppelzauer

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

MOTIVATION: Cultural Heritage

3Chinese New year

MOTIVATION: Cultural Heritage

4Carnival Rio

Classic onsite explorers

Onsite social media is big data...

...and online explorers need our help

CHALEARN: Looking at People

TRAININGSET

VALIDATIONSET

TESTSET

50 EVENTS

MOTIVATION: Goals

● Improve the results obtained in ChaLearn Challenge.

● Exploit the noisy data collected from Flickr

STATE OF THE ART: CaffeNet

ContentVisual

Time stamp ContextGeolocation

Zaharieva’15 X X X

Mattivi’11 X X

Bossard’13 X X

Cao’08 X X X

Sutanto’13 X

Schinas’12 X X

Brenner’13 X X

Nguyen’13 X X

MediaEvalSocial

Event Detection

STATE OF THE ART: CaffeNet

CaffeNet

ARCHITECTURE[Khrizevsky’12]

SOFTWARE[Jia’14]

DATA[Deng’09]

STATE OF THE ART: CNN ARCHITECTURE

Convolutional Neural Network architecture

Babenko et al, Neural codes for image retrieval. In Computer Vision-ECCV, 2014

STATE OF THE ART: Object+Scene CNNs

Object-Scene Convolutional Neural Network for event recognition

Wang et al, Object-scene convolutional neural networks for event recognition in images. In CVPRW, 2015

BASELINE: Fine-tuning a ConvNet

BASELINE: ChaLearn @ CVPRW 2015

Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015

Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015

Convnets require to be trained with...

a large amount of labeled images

but clean data is expensive...

and downloading noisy data in an unsupervised fashion is easier and cheaper.

NOISY DATA: Flickr Dataset

FLICKR DATASET

4,06850

EVENTS

DATASET BIAS

Dataset bias when fine-tuning with ChaLearn or Flickr dataset:

DENOISING THE FLICKR DATASET

Mosaic of Queens Day from ChaLearn Mosaic of Queens Day from Flickr

25Example event: Annual Buffalo Roundup

Fine-tuned model with ChaLearn

New subset from

BASELINE: Dataset ordering during fine-tuning

CaffeNet

FINE-TUNING JOINT:

Joint fine-tuning of the clean and noisy datasets:

0.6136

CaffeNet

FINE-TUNING: FINE-TUNING:

Sequential fine-tuning of the clean and noisy datasets:

0.6136

CaffeNet

FINE-TUNING:FINE-TUNING:

Sequential fine-tuning of the noisy and clean datasets:

0.6136

FRACKING MINING +/- SAMPLES

FRACKING THE TRAINING DATASET

34Example event: Pingxi Lantern Festival

Fine-tuned model with ChaLearn

New subset from

hard negatives

hard positive

CaffeNet

FINE-TUNING: Fine-tuning with fracking subset from:

FRACKING THE TRAINING DATASET

Results of fine-tuning using fracking in images from ChaLearn:

baseline: 0.61365

FINE-TUNING DEEPER LAYERS ONLY

38Layer 2 responds to corners and other edge/color conjunctions.

Layer 3 has more complex invariances, capturing similar textures Zeiler et al, Visualizing and Understanding Convolutional Networks, In Computer Vision-ECCV 2014,

Andrej Karpathy. Convolutional neural networks for visual recognition. In Stanford CS class CS231n.

FC6 FC7

Results of only fine-tuning the deeper layers:

0.61365

Results of only fine-tuning the deeper layers :

0.6136

ENSEMBLE OF EVENT DETECTORS

SINGLE CONVNET FOR THE 50 EVENTS:

ONE CONVNET FOR EACH EVENTS:

Results of ensemble of binary :

0.6136

CONLUSIONS

● The Flickr dataset helped us to improve the score by swapping the order in which we were using the clean and noisy datasets

CaffeNet

FINE-TUNING:FINE-TUNING:+1,3%

CONLUSIONS

● The network actually succeeds in improving his performance by learning from its own mistakes when applying fracking.

CaffeNet

FINE-TUNING: Fine-tuning with fracking subset from:

CONLUSIONS

● The results are better if we keep the weights learned in the earlier layers from a very large dataset.

CONLUSIONS

● Fine-tuning one convnet for each class increases the score.

FUTURE WORK

● Mix our solutions with a fine-tuned network with PLACES, and with other local solutions.

SCENE CNN (PLACES)

● Compete (and try to win) ChaLearn @ ICCV 2015 !!

FINE-TUNING A CONVOLUTIONAL NETWORK FOR CULTURAL EVENT RECOGNITION

ADVISORS:

Andrea Calafell

Xavier Giró-i-Nieto Amaia Salvador

20/07/2015

AUTHOR:

Matthias Zeppelzauer

Fine tuning a convolutional network for cultural event recognition

Technology

10 fine-tuning project plan

Luke Barnes Fine-tuning

Fine-Tuning of Agile Development

Fine-Tuning Groth-Sahai Proofs

Fine-tuning Deep Convolutional Networks for Plant Recognitionceur-ws.org/Vol-1391/121-CR.pdf · Fine-tuning Deep Convolutional Networks for Plant Recognition Angie K. Reyes 1, Juan

Convolutional Neural Networks for Medical Image 1 Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? Nima Tajbakhsh , Member, IEEE, Jae Y. Shin

Workshop - CSV fine tuning

Fine-tuning the RL

Fine-tuning Ranking Models:

FINE-TUNING THE MULTIVERSE

B9 Radio Fine Tuning

Fine Tuning

The Fine- Tuning Argument

Thyroid Nodule Classification in Ultrasound Images by Fine ... · Thyroid Nodule Classification in Ultrasound Images by Fine-Tuning Deep Convolutional Neural Network Jianning Chi1

The Fallacy of Fine-tuning

Editing and Fine Tuning

PLL e Fine Tuning Circuit

Fine-tuning fund management

07 Fine-tuning Task Details

Fine tuning - WorkWell