Fine tuning a convolutional network for cultural event recognition

Preview:

Citation preview

FINE-TUNING A CONVOLUTIONAL NETWORK FOR CULTURAL EVENT RECOGNITION

ADVISORS:

Andrea Calafell

Xavier Giró-i-Nieto Amaia Salvador

20/07/2015

AUTHOR:

Matthias Zeppelzauer

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

2

MOTIVATION: Cultural Heritage

3Chinese New year

MOTIVATION: Cultural Heritage

4Carnival Rio

Classic onsite explorers

5

Onsite social media is big data...

6

...and online explorers need our help

7

CHALEARN: Looking at People

8

TRAININGSET

5,875

VALIDATIONSET

2,332

TESTSET

3,569

50 EVENTS

MOTIVATION: Goals

9

● Improve the results obtained in ChaLearn Challenge.

● Exploit the noisy data collected from Flickr

STATE OF THE ART: CaffeNet

11

CaffeNet

ARCHITECTURE[Khrizevsky’12]

SOFTWARE[Jia’14]

DATA[Deng’09]

STATE OF THE ART: CNN ARCHITECTURE

12

Convolutional Neural Network architecture

Babenko et al, Neural codes for image retrieval. In Computer Vision-ECCV, 2014

STATE OF THE ART: Object+Scene CNNs

13

Object-Scene Convolutional Neural Network for event recognition

Wang et al, Object-scene convolutional neural networks for event recognition in images. In CVPRW, 2015

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

14

BASELINE: Fine-tuning a ConvNet

15

50

BASELINE: ChaLearn @ CVPRW 2015

16

Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015

Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015

BASELINE: ChaLearn @ CVPRW 2015

17

Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015

Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

18

Convnets require to be trained with...

19

a large amount of labeled images

but clean data is expensive...

20

and downloading noisy data in an unsupervised fashion is easier and cheaper.

NOISY DATA: Flickr Dataset

21

FLICKR DATASET

4,06850

EVENTS

DATASET BIAS

22

Dataset bias when fine-tuning with ChaLearn or Flickr dataset:

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

23

DENOISING THE FLICKR DATASET

24

Mosaic of Queens Day from ChaLearn Mosaic of Queens Day from Flickr

DENOISING THE FLICKR DATASET

25Example event: Annual Buffalo Roundup

Fine-tuned model with ChaLearn

New subset from

BASELINE: Dataset ordering during fine-tuning

26

CaffeNet

FINE-TUNING JOINT:

DENOISING THE FLICKR DATASET

27

Joint fine-tuning of the clean and noisy datasets:

0.6136

BASELINE: Dataset ordering during fine-tuning

28

CaffeNet

FINE-TUNING: FINE-TUNING:

DENOISING THE FLICKR DATASET

29

Sequential fine-tuning of the clean and noisy datasets:

0.6136

BASELINE: Dataset ordering during fine-tuning

30

CaffeNet

FINE-TUNING:FINE-TUNING:

DENOISING THE FLICKR DATASET

31

Sequential fine-tuning of the noisy and clean datasets:

0.6136

+1,3%

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

32

FRACKING MINING +/- SAMPLES

33

FRACKING THE TRAINING DATASET

34Example event: Pingxi Lantern Festival

Fine-tuned model with ChaLearn

New subset from

hard negatives

hard positive

BASELINE: Dataset ordering during fine-tuning

35

CaffeNet

FINE-TUNING: Fine-tuning with fracking subset from:

FRACKING THE TRAINING DATASET

36

Results of fine-tuning using fracking in images from ChaLearn:

baseline: 0.61365

+0,9%

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

37

FINE-TUNING DEEPER LAYERS ONLY

38Layer 2 responds to corners and other edge/color conjunctions.

FINE-TUNING DEEPER LAYERS ONLY

39

Layer 3 has more complex invariances, capturing similar textures Zeiler et al, Visualizing and Understanding Convolutional Networks, In Computer Vision-ECCV 2014,

FINE-TUNING DEEPER LAYERS ONLY

40

50

Andrej Karpathy. Convolutional neural networks for visual recognition. In Stanford CS class CS231n.

FC6 FC7

FC8

FINE-TUNING DEEPER LAYERS ONLY

41

Results of only fine-tuning the deeper layers:

+3%

0.61365

FINE-TUNING DEEPER LAYERS ONLY

42

Results of only fine-tuning the deeper layers :

+4%

0.6136

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

43

BASELINE: ChaLearn @ CVPRW 2015

44

Awarded with the 2nd prize of the Cultural Event Recognition Challenge in the ChaLearn Workshop at CVPR 2015

Salvador. A, Giro-i-Nieto. X, Calafell, A, et al, Cultural Event Recognition with Visual ConvNets and Temporal Models. In CVPRW, 2015

ENSEMBLE OF EVENT DETECTORS

45

SINGLE CONVNET FOR THE 50 EVENTS:

ENSEMBLE OF EVENT DETECTORS

46

ONE CONVNET FOR EACH EVENTS:

ENSEMBLE OF EVENT DETECTORS

47

Results of ensemble of binary :

+6,6%

0.6136

OUTLINE1. Motivation and State of the art2. Baseline 3. Study of the dataset bias4. Denoising5. Fracking6. Fine-tuning deeper layers only7. Ensemble of event detectors8. Conclusions and future work

48

CONLUSIONS

49

● The Flickr dataset helped us to improve the score by swapping the order in which we were using the clean and noisy datasets

CaffeNet

FINE-TUNING:FINE-TUNING:+1,3%

CONLUSIONS

50

● The network actually succeeds in improving his performance by learning from its own mistakes when applying fracking.

+0,9%

CaffeNet

FINE-TUNING: Fine-tuning with fracking subset from:

CONLUSIONS

51

● The results are better if we keep the weights learned in the earlier layers from a very large dataset.

50

+4%

CONLUSIONS

52

● Fine-tuning one convnet for each class increases the score.

+6,6%

FUTURE WORK

53

● Mix our solutions with a fine-tuned network with PLACES, and with other local solutions.

SCENE CNN (PLACES)

LOCAL

NOW

● Compete (and try to win) ChaLearn @ ICCV 2015 !!

FINE-TUNING A CONVOLUTIONAL NETWORK FOR CULTURAL EVENT RECOGNITION

ADVISORS:

Andrea Calafell

Xavier Giró-i-Nieto Amaia Salvador

20/07/2015

AUTHOR:

Matthias Zeppelzauer