53
Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto

Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

Embed Size (px)

Citation preview

Page 1: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

Day 4 Lecture 4

Video Analytics

Xavier Giró-i-Nieto

Page 2: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

2

Motivation

Page 3: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

3

Motivation

Page 4: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

4

Motivation

Page 5: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

5

Outline

1. Scene Classification2. Object Detection & Tracking

Page 6: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

6

Scene Classification

(Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

Page 7: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

7Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Scene Classification

Page 8: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

8Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Previous lectures

Scene Classification

Page 9: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

9Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015

Scene Classification

Page 10: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

10

Scene Classification: DeepVideo: Architectures

(Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

Page 11: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

11

Unsupervised learning [Le at al’11] Supervised learning [Karpathy et al’14]

Scene Classification: DeepVideo: Features

(Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

Page 12: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

12

Scene Classification: DeepVideo: Multires

(Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

Page 13: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

13

Scene Classification: DeepVideo: Results

(Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

Page 14: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

14Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015

Scene Classification

Page 15: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

15

Scene Classification: C3D

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015

Page 16: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

16K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition” ICLR 2015.

Scene Classification: C3D: Spatial Dimensions

Page 17: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

17

3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets

Temporal depth

2D ConvNets

Scene Classification: C3D: Temporal dimension

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015

Page 18: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

18

A homogeneous architecture with small 3 × 3 × 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets

Scene Classification: C3D: Temporal dimension

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015

Page 19: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

19

No gain when varying the temporal depth across layers.

Scene Classification: C3D: Temporal dimension

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015

Page 20: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

20

Featurevector

Scene Classification: C3D: Network Architecture

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015

Page 21: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

21

Video sequence

16 frames-long clips

8 frames-long overlap

Scene Classification: C3D: Feature Vector

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015

Page 22: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

22

16-frame clip

16-frame clip

16-frame clip

16-frame clip

...

Average

4096

-dim

vid

eo d

escr

ipto

r

4096

-dim

vid

eo d

escr

ipto

r

L2 norm

Scene Classification: C3D: Feature Vector

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015

Page 23: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

23

Based on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details.

Scene Classification: C3D: Visualization

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015

Page 24: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

24

C3D + simple linear classifier outperformed state-of-the-art methods on 4 different benchmarks, and were comparable with state of the art methods on other 2 benchmarks

Scene Classification: C3D: Visualization

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015

Page 25: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

25

Implementation by Michael Gygli (GitHub)

Scene Classification: C3D: Software

Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015

Page 26: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

26Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. "Beyond short snippets: Deep networks for video classification." CVPR 2015

Classification: Image & Optical Flow CNN + LSTM

Page 27: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

27

(Scene Classification: Image &) Optical Flow

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., FlowNet: Learning Optical Flow With Convolutional Networks. CVPR 2015

Page 28: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

28

(Scene Classification: Image &) Optical FlowSince existing ground truth datasets are not sufficiently large to train a Convnet, a synthetic dataset is generated… and augmented (translation, rotation, scaling transformations; additive Gaussian noise; changes in brightness, contrast, gamma and color).

Data augmentation

Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., FlowNet: Learning Optical Flow With Convolutional Networks. CVPR 2015

Page 29: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

29

Scene Classification & Detection

“Biking”

CNN RNN+

Slide credit: Albero Montes

Page 30: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

30

Classification & Detection: Proposals + C3D

(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

Page 31: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

31

Classification & Detection: Proposals + C3D

(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

(1) Binary classification: Action or No Action

Page 32: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

32

Classification & Detection: Proposals + C3D

(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

(2) One-vs-all Action classification

Page 33: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

33

Classification & Detection: Proposals + C3D

(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

(3) Refinement with temporal-aware loss function

Page 34: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

34

Classification & Detection: Proposals + C3D

(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

Post-processing

Page 35: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

35

Classification & Detection: Proposals + C3D

(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

Page 36: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

36

Classification & Detection: Image + RNN + Reinforce

Yeung, Serena, Olga Russakovsky, Greg Mori, and Li Fei-Fei. "End-to-end Learning of Action Detection from Frame Glimpses in Videos." CVPR 2016

Page 38: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

38

Outline

1. Scene Classification2. Object Detection & Tracking

Page 39: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

39[ILSVRC 2015 Slides and videos]

Objects: ImageNet Video

Page 40: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

40[ILSVRC 2015 Slides and videos]

Objects: ImageNet Video

Page 41: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

41

(Slides by Andrea Ferri): Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, and Wanli Ouyang, “Object Detection From Video Tubelets With Convolutional Neural Networks”, CVPR 2016 [code]

Objects: ImageNet Video: T-CNN

Object Detection

Object Tracking

Page 42: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

42

Domain-specific layers are used during training for each sequence, but are replaced by a single one at test time.

Objects: Tracking: MDNet

Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)

Page 43: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

43

Objects: Tracking: MDNet

Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)

Page 44: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

44

Objects: Tracking: FCNT

Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." CVPR 2015 [code]

Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification.

conv4-3 conv5-3

Page 45: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

45Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]

Despite trained for image classification, feature maps in conv5-3 enable object localization...but are not discriminative enough to different instances of the same class.

Objects: Tracking: FCNT: Localization

Page 46: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

46Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]

On the other hand, feature maps from conv4-3 are more sensitive to intra-class appearance variation…

Objects: Tracking: FCNT: Localization

conv4-3 conv5-3

Page 47: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

47Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]

SNet=Specific Network (online update)

GNet=General Network (fixed)

Objects: Tracking: FCNT: Localization

Page 48: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

48Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. "Object detectors emerge in deep scene cnns." ICLR 2015.

Other works have also highlighted how features maps in convolutional layers allow object localization.

Objects: Tracking: FCNT: Localization

Page 49: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

49

Objects: Tracking: DeepTracking

P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” AAAI 2016. [code]

Page 50: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

50P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” AAAI 2016. [code]

Objects: Tracking: DeepTracking

Page 51: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

51

Objects: Tracking: DeepTracking

P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” AAAI 2016. [code]

Page 52: Video Analytics Day 4 Lecture 4 - GitHub Pagesimatge-upc.github.io/telecombcn-2016-dlcv/slides/D4L4-video.pdf · Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto. 2 Motivation

52

Summary● Works on video are normally extensions from principles

previously tested on still images.

● RNNs can naturally handle the diversity in video lengths,

and capture its temporal dependencies.

● Trick: Init your networks to predict the next frame.