Upload
xavier-giro
View
54
Download
1
Tags:
Embed Size (px)
Citation preview
FINE-TUNING A CONVOLUTIONAL NETWORK FOR CULTURAL EVENT RECOGNITION
ADVISORS:
Andrea Calafell
Xavier Giró-i-Nieto Amaia Salvador
20/07/2015
AUTHOR:
Matthias Zeppelzauer
OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work
2
MOTIVATION: Cultural Heritage
3Chinese New year
MOTIVATION: Cultural Heritage
4Carnival Rio
Classic onsite explorers
5
Onsite social media is big data...
6
...and online explorers need our help
7
CHALEARN: Looking at People
8
TRAININGSET
5,875
VALIDATIONSET
2,332
TESTSET
3,569
50 EVENTS
MOTIVATION: Goals
9
● Improve the results obtained in ChaLearn Challenge.
● Exploit the noisy data collected from Flickr
STATE OF THE ART: CaffeNet
10
ContentVisual
Time stamp ContextGeolocation
Text
Zaharieva’15 X X X
Mattivi’11 X X
Bossard’13 X X
Cao’08 X X X
Sutanto’13 X
Schinas’12 X X
Brenner’13 X X
Nguyen’13 X X
MediaEvalSocial
Event Detection
STATE OF THE ART: CaffeNet
11
CaffeNet
ARCHITECTURE[Khrizevsky’12]
SOFTWARE[Jia’14]
DATA[Deng’09]
STATE OF THE ART: CNN ARCHITECTURE
12
Convolutional Neural Network architecture
Babenko et al, Neural codes for image retrieval. In Computer Vision-ECCV, 2014
STATE OF THE ART: Object+Scene CNNs
13
Object-Scene Convolutional Neural Network for event recognition
Wang et al, Object-scene convolutional neural networks for event recognition in images. In CVPRW, 2015
OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work
14
BASELINE: Fine-tuning a ConvNet
15
50
BASELINE: ChaLearn @ CVPRW 2015
16
Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015
Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015
BASELINE: ChaLearn @ CVPRW 2015
17
Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015
Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015
OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work
18
Convnets require to be trained with...
19
a large amount of labeled images
but clean data is expensive...
20
and downloading noisy data in an unsupervised fashion is easier and cheaper.
NOISY DATA: Flickr Dataset
21
FLICKR DATASET
4,06850
EVENTS
DATASET BIAS
22
Dataset bias when fine-tuning with ChaLearn or Flickr dataset:
OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work
23
DENOISING THE FLICKR DATASET
24
Mosaic of Queens Day from ChaLearn Mosaic of Queens Day from Flickr
DENOISING THE FLICKR DATASET
25Example event: Annual Buffalo Roundup
Fine-tuned model with ChaLearn
New subset from
BASELINE: Dataset ordering during fine-tuning
26
CaffeNet
FINE-TUNING JOINT:
DENOISING THE FLICKR DATASET
27
Joint fine-tuning of the clean and noisy datasets:
0.6136
BASELINE: Dataset ordering during fine-tuning
28
CaffeNet
FINE-TUNING: FINE-TUNING:
DENOISING THE FLICKR DATASET
29
Sequential fine-tuning of the clean and noisy datasets:
0.6136
BASELINE: Dataset ordering during fine-tuning
30
CaffeNet
FINE-TUNING:FINE-TUNING:
DENOISING THE FLICKR DATASET
31
Sequential fine-tuning of the noisy and clean datasets:
0.6136
+1,3%
OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work
32
FRACKING MINING +/- SAMPLES
33
FRACKING THE TRAINING DATASET
34Example event: Pingxi Lantern Festival
Fine-tuned model with ChaLearn
New subset from
hard negatives
hard positive
BASELINE: Dataset ordering during fine-tuning
35
CaffeNet
FINE-TUNING: Fine-tuning with fracking subset from:
FRACKING THE TRAINING DATASET
36
Results of fine-tuning using fracking in images from ChaLearn:
baseline: 0.61365
+0,9%
OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work
37
FINE-TUNING DEEPER LAYERS ONLY
38Layer 2 responds to corners and other edge/color conjunctions.
FINE-TUNING DEEPER LAYERS ONLY
39
Layer 3 has more complex invariances, capturing similar textures Zeiler et al, Visualizing and Understanding Convolutional Networks, In Computer Vision-ECCV 2014,
FINE-TUNING DEEPER LAYERS ONLY
40
50
Andrej Karpathy. Convolutional neural networks for visual recognition. In Stanford CS class CS231n.
FC6 FC7
FC8
FINE-TUNING DEEPER LAYERS ONLY
41
Results of only fine-tuning the deeper layers:
+3%
0.61365
FINE-TUNING DEEPER LAYERS ONLY
42
Results of only fine-tuning the deeper layers :
+4%
0.6136
OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work
43
BASELINE: ChaLearn @ CVPRW 2015
44
Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015
Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015
ENSEMBLE OF EVENT DETECTORS
45
SINGLE CONVNET FOR THE 50 EVENTS:
ENSEMBLE OF EVENT DETECTORS
46
ONE CONVNET FOR EACH EVENTS:
ENSEMBLE OF EVENT DETECTORS
47
Results of ensemble of binary :
+6,6%
0.6136
OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work
48
CONLUSIONS
49
● The Flickr dataset helped us to improve the score by swapping the order in which we were using the clean and noisy datasets
CaffeNet
FINE-TUNING:FINE-TUNING:+1,3%
CONLUSIONS
50
● The network actually succeeds in improving his performance by learning from its own mistakes when applying fracking.
+0,9%
CaffeNet
FINE-TUNING: Fine-tuning with fracking subset from:
CONLUSIONS
51
● The results are better if we keep the weights learned in the earlier layers from a very large dataset.
50
+4%
CONLUSIONS
52
● Fine-tuning one convnet for each class increases the score.
+6,6%
FUTURE WORK
53
● Mix our solutions with a fine-tuned network with PLACES, and with other local solutions.
SCENE CNN (PLACES)
LOCAL
NOW
● Compete (and try to win) ChaLearn @ ICCV 2015 !!
FINE-TUNING A CONVOLUTIONAL NETWORK FOR CULTURAL EVENT RECOGNITION
ADVISORS:
Andrea Calafell
Xavier Giró-i-Nieto Amaia Salvador
20/07/2015
AUTHOR:
Matthias Zeppelzauer