Convolutional Neural Networks (Part II)lvelho.impa.br/ip17/proj/slides/1110GoingDeeperConv.pdf ·...

Convolutional Neural Networks 1

Convolutional Neural Networks (Part II)

08, 10 & 17 Nov, 2016

J. Ezequiel Soto S.Image Processing 2016

Prof. Luiz Velho

Summary & References08/11 ImageNet Classification with Deep Convolutional Neural Networks

2012, Krizhevsky et. al. [source]10/11 Going Deeper with Convolutions

2015, Szegedy et. al. [source]17/11 Painting Style Transfer for Head Portraits using Convolutional Neural Networks

2016, Selim & Elgharib [source]

+ An Analysis of Deep Neural Network Models for Practical Applications

2016, Canziani & Culurciello [source]+ Provable bounds for learning some deep representations

2013, Arora et.al. [source]

Going Deeper with Convolutions

Szegedy et.al. 2015

Outline● Introduction● Related Work● Motivation● Architecture Detail● GoogLeNet● Training● ILSVRC 2014● Conclusions

Introduction● GoogLeNet submission to ILSVRC →2014

● Accuracy + low cost in ops (1.5B @inference) → real world applicability

● Efficient CNN architecture: Inception

● Depth: network layers + Inception module

● Results!!! New State of the Art→

Related Work● Standard CNN layer:

convolution + normalization + max pooling ● Good results in MNIST, CIFAR and ImageNet (with dropout vs. overfitting)

● Concerns that max-pooling loses spatial information● Neuro-science model of primate vision: stack filters

→ inspiration of the inception module● Network in Network model (NiN)● 1 x 1 convolutions:

– Increase depth– Dimension reduction (reduce computational cost)

● Regions with Convolutional Neural Networks: R-CNN

Motivation● Improve CNNs by growing them deeper and wider…

– Too much parameters Overfitting →– Computational cost: two layers chained

2x filters o→ 2 computation– Zero entries? Sparsity control*→– The lack of structure, large number of filters and great

batches Efficient use of dense computation→

* Theoretical results: 2013, Arora et.al. “Provable Bounds for Learning Some Deep Representations”, 54 p.

Motivation“This raises the question whether there is any hope for a next, intermediate step: an architecture that makes use of the extra sparsity, even at filter level, as suggested by the theory, but exploits our current hardware by utilizing computations on dense matrices.”

● Inception idea…– case study trying to approximate Arora’s sparse structure with

dense, available components (convolutions)– highly speculative / immediate good results

CAUTION: “although the proposed architecture has become a success for computer vision, it is still questionable whether its quality can be attributed to the guiding principles that have lead to its construction”

“Given samples from a sparsely connected neural network whose each layer is a denoising autoencoder, can the net (and hence its reverse) be learnt in polynomial time with low sample complexity?”

Video 1Video 2

Architecture Detail“finding out how an optimal local sparse structure in a convolutional vision network can be approximated and covered by readily available dense components”

● Translation invariance convolutional→● Local construction that repeats● Theory points at analyzing correlations of last layer and cluster by it.

● Lower layers: correlation spatial localization→● Avoid “aligned” correlations… using different sized filers

Architecture Detail● Higher levels higher abstraction:→● Spatial correlation decreases

→ increase use of bigger filters (3×3, 5×5)● Stacking large filters blows up the number of outputs! reduce dimension→

● Avoid too much compression of the information and maintain sparsity → 1×1 convolutions before the larger ones!

Inception module

Architecture Detail● Lower levels: classic convolutions● Higher levels: inceptions modules

* Author thinks this isn’t necessary, but compensates some inefficiency of structure design...

● Intuition scale invariance of visual information before abstraction→

● Increased computation efficiency achieved by the reductions, allowing to grow depth and breath

● Efficiency: 3 – 10x faster than similar networks without inception modules, but the design has to be careful.

GoogLeNet● Specific design with Inception models used in the ILSVRC 2014 competition

● Same design for 6/7 of the ensemble models

● 22 layers deep

● Detail:– All convolutions include ReLU– Input: 224×224 in RGB with zero mean– #3×3 reduce = 1×1 filters before 3×3 convolutions– #5×5 reduce = 1×1 filters before 5×5 convolutions– pool proj = 1×1 filters after max-pooling

GoogLeNet● 22 layers (27 with max-pooling)● 100 independent building blocks● Pooling before classifying: NiN

+ Linear layer: convenience / change labels● Avg-pooling over FC gives +0.6% top-1 acc● Dropout remained essential

● Propagate gradient in effective manner discriminate →correctly in middle layers

● Inclusion of intermediate classifiers: convolutional networks on top of the inception modules (4a) and (4d) 0.3*Loss→

● Auxiliary classifiers are ignored at inference / marginal effect

GoogLeNet● Auxiliary network:

– Avg-pooling: 5×5 filter, stride 3(4a) 4×4×512→(4d) 4×4×528→

– 1×1 convolution with 128 filters + ReLU– FC layer with 1024 units + ReLU– Dropout layer (70%)– Linear + softmax for 1000 classes

(removed @inference)

Training Methodology● DistBelief: modest model & data parallelism (…Google)

CPU only → one week in a few GPUs (memory!)

● Stochastic Gradient Descent:– 0.9 momentum– Fixed learning rate:

-4% every 8 epochs– Polyak-Ruppert average of the iterations of

SGD● Many different methods for sampling and training over the images…– Different size crops– Patches 8% - 100% of the image– Aspect ratio [¾, 4/3]– Photometric distortions

ILSVRC 2014: Classification● No external data for training● 7 versions of GoogLeNet model (1 wider)

– Same initialization (same weights: oversight)– Same learning policies– Different sampling

→ Ensemble prediction● Testing (more aggressive than AlexNet)

– 4 scales (256, 288, 320, 352)– Left, center and right (top, center, bottom) squares– Each square: full + 4 corners + center (224×224)– Mirrored image

→ 4×3×6×2 = 144 crops per imageNot necessary / decreasing marginal benefit

● Softmax: avg over all crops & all models (1008 tests) avgn=1008

Source: 2016, Canziani & Culurciello

ILSVRC 2014: Detection● Produce bbox around objects in 200 classes

– Correct if the bbox overlaps 50% w/ groundtruth– Extraneous detection (false +) are penalized

● Submission:– R-CNN + Inception model as region classifier– Selective search (2x pixel) + Multibox– Classify region: ensemble of 6 GoogLeNet models– No bounding box regression (R-CNN)– Report mean avg precision (mAP)

Source: 2016, Canziani & Culurciello

Conclusions“...approximating the expected optimal sparse structure by readily available dense building blocks is a viable method for improving neural networks for computer vision.”

● Large gain / Small increase in computation● Detection is very competitive not using context nor bbox regression

● Moving to sparser architectures: feasible & useful● Importance of the analysis!!! (2013, Arora et.al.)

DeepDream (side result) examples are creepy… but show the reverse function of the network!

Input image force it to get close to animal categories→

Will continue, again...

Convolutional Neural Networks (Part II)lvelho.impa.br/ip17/proj/slides/1110GoingDeeperConv.pdf ·...

Documents

Convolutional Neural Networks Summary - auth

Intro To Convolutional Neural Networks

Introduction to Convolutional Neural Networks

Convolutional neural networks deepa

SUPERRESOLUTION RECURRENT CONVOLUTIONAL NEURAL NETWORKS

Convolutional Neural Networks Arise From Ising Models …sunilpai/convolutional-neural-networks.pdf · Convolutional Neural Networks Arise From Ising Models and Restricted Boltzmann

Convolutional Neural Networks Analyzed via Convolutional ... · PDF fileConvolutional Neural Networks Analyzed via Convolutional Sparse Coding ... Deep Learning, Convolutional Neural

Understanding Convolutional Neural Networks

Chapter 3: Convolutional Neural Networks

IMPLEMENTASI CONVOLUTIONAL NEURAL NETWORKS UNTUK

Convolutional Neural Networks Arise From Ising Models and ...web.stanford.edu/~sunilpai/convolutional-neural-networks.pdf · Convolutional neural networks are an attractive option

Convolutional Neural Networks for No-Reference Image Quality Assessmentopenaccess.thecvf.com/...Convolutional_Neural_Networks_2014_CVPR_paper.pdf · Convolutional Neural Networks

Deep Convolutional Neural Networks - Overview

Convolutional Networks with Adaptive Inference Graphs€¦ · Keywords Convolutional neural networks · Gumbel-Softmax · Residual networks 1 Introduction Often, convolutional networks

Lecture 5 Convolutional Neural Networks

Convolutional Neural Networks (CNNs / ConvNets)web.stanford.edu/class/cs379c/archive/2018/class...Convolutional Neural Networks (CNNs / ConvNets) Convolutional Neu ral Networks are

Convolutional Neural Networks - Virginia Techjbhuang/teaching/ece5554-4554/fa17/... · Convolutional Neural Networks Computer Vision ... ImageNet Classification with Deep Convolutional

Lecture 10: Convolutional Neural Networks · CS231n: Convolutional Neural Networks K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large -Scale Image Recognition,

Convolutional Neural Networks - arXiv

Convolutional Neural Networks: An Introduction