38
Visual perception through Deep Learning Dario Garcia-Gasulla [email protected] Barcelona Supercomputing Center (BSC) June 1, 2016

Visual perception through Deep Learning

Embed Size (px)

Citation preview

Visual perception through Deep Learning

Dario [email protected]

Barcelona Supercomputing Center (BSC)

June 1, 2016

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

The basics

Dario Garcia-Gasulla June 1, 2016 2 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

The Artificial Neuron and the Artificial Neural Network

Definition (Maureen Caudill)...a computing system made up of a number of simple, highlyinterconnected processing elements, which process information bytheir dynamic state response to external inputs.

McCulloch & Pitts, 1943 Rosenblat, 1958

Dario Garcia-Gasulla June 1, 2016 3 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Training Neural Networks

Backpropagation algorithmI Measure error on output

(loss function)

I Optimize weights to reduceloss (Gradient Descent)

I Backpropagate the loss,layer by layer, until allneuron weights have beenimproved

I Repeat!

Dario Garcia-Gasulla June 1, 2016 4 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

(old) Neural Networks

I Traditionally used as classifiers for simple problems

I Capable of finding non-linearities on the data

LimitationsI Large networks are increasingly expensive to train (millions

of weights)

I Needs tons of data to find complex non-linearities

I Training easily stalls on local sub-optimals

Dario Garcia-Gasulla June 1, 2016 5 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Deep Neural Network (aka Deep Learning)

More layers! Made possible by:I Hardware Advances (GPU’s)I More efficient types of neuronsI Training optimizations

Dario Garcia-Gasulla June 1, 2016 6 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Deep Learning families

Based on neuron, layer and training particularities:I Convolutional Neural Networks (CNNs): Capture 2D

features. Appropriate for visual data.

I Recurrent Neural Networks (RNNs): Capture streams ofdata. May include memory components (LSTM).Appropriate for text, sound, etc..

I Deep Belief Network : Probabilistic model.

I ... and many others

Dario Garcia-Gasulla June 1, 2016 7 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Convolutional Neural Networks

Dario Garcia-Gasulla June 1, 2016 8 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

The explosion of Deep LearningThe ImageNet Challenge: Visual recognition competition.Recognize 1,000 different objects.

In 2012...Alex Krizhevsky et. al. trained a CNN with 5 layers...

and improved the best result by 11%.In 2014 all candidates were based on CNNs.In 2015, human-level performance was achieved.

Dario Garcia-Gasulla June 1, 2016 9 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

CNNs: The Origin

DesignI Fukushima, 1980 (neocognitron). LeCun, 1998, 2003.I Based on the visual cortex of animals: Each neuron

percieves a small portion of the input, and exploits thespatial correlation.

I Reuse neuron weights to reduce complexity.

What was missing:I Feasible implementation.I GPUs

Dario Garcia-Gasulla June 1, 2016 10 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

CNNs Layers: Convolution

Convolutional LayersEach neuron inputs a small patch of data (called receptive fielde.g., 3x3). A neuron parameters are convolved on all the input.This provides translation invariace.

Dario Garcia-Gasulla June 1, 2016 11 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

CNNs Layers: Pooling

Pooling LayersI Down-sampling technique to reduce complexity at the

price of precision.I Reduce values within pooling filter (e.g., 2x2) to the

maximum or average (e.g., max pooling, average pooling).I The exact location is not as important as relative location.

Dario Garcia-Gasulla June 1, 2016 12 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

CNNs Layers: Fully-connected

Fully-connected LayersI Standard NN layer. Each neuron inputs all neurons from

the previous layer.I Spatial information is no longer taken into account.I The output will be an estimate of prediction (class

probability).

Dario Garcia-Gasulla June 1, 2016 13 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

CNNs Architecture

Standard ArchitectureI Stack convolution and pooling layers.

I To estimate probabilities, use fully connected layers at theend. Output feeds a classifier (softmax, SVM).

Dario Garcia-Gasulla June 1, 2016 14 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

CNNs in ActionDuring Traning

A CNN trained to recognize objectslearns different representations ateach depth.

1. Lines, angles2. Composed shapes3. Parts of entities4. Full entities

During Deployment

The CNN looks for increasinglycomplex patterns in the image.Finally, by considering the mostcomplex (top layer) a class predictionis made.

Dario Garcia-Gasulla June 1, 2016 15 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

CNNs Practical Notes

RequirementsI Large set of labeled data for trainingI Computational power for training (GPUs)I Deployment is cheap

Where to start?I Almost out-of-the-box CNNs: Caffe, Torch, Theano,

TensorFlowI Pre-trained models are available for download

Dario Garcia-Gasulla June 1, 2016 16 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Applications ofConvolutional Neural Networks

Dario Garcia-Gasulla June 1, 2016 17 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Object Recognition (A. Krizhevsky et. al., 2012.)

Dario Garcia-Gasulla June 1, 2016 18 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Image Segmentation (LC Chen et. al., 2014.)

Dario Garcia-Gasulla June 1, 2016 19 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Style Transfer (Gatys et. al., 2015.)

Dario Garcia-Gasulla June 1, 2016 20 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Colorization (Zhang et. al., 2016.)

Dario Garcia-Gasulla June 1, 2016 21 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Colorization II (Iizuka et. al., 2016.)

Dario Garcia-Gasulla June 1, 2016 22 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Other applications

Mobile AppsI Aipoly, AI Scry, BlindTool: Textual description of image

I Artify: Artistic style

I Nippler, AwesomeCNN, WhatPlant: Object detections

AI ChallengesI Playing videogames, GO, ...

I Self-driving car

I Image retrieval (Google, Facebook, Instagram, etc.)

Dario Garcia-Gasulla June 1, 2016 23 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Mining CNN learnt representations:A case of research

Dario Garcia-Gasulla June 1, 2016 24 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Beyond CNNs

CNNs...I Learn lots of relevant representations (millions) from a

training set

I Characterize input data based on learnt representations

Our hypothesisI Kind of a Feature extractorI What mining/learning can be performed with those

features?

Dario Garcia-Gasulla June 1, 2016 25 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Step 1: Vector Embeddings

From Images to VectorsI For a given image, annotate which neurons

activate for it, and its activation strength

I Use a subset of those neurons to define afixed vector length

I Produce a vector for each image, assumingeach variable is independent

The vector represents everything the CNNpercieves in the image

Dario Garcia-Gasulla June 1, 2016 26 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Step 2: Abstract Representations

Image Class VectorsI Images are imperfect representations of entities (changes

in perspective, ilumination, specimen etc.)

I To build stable class representations we need to aggregatethe evidence provided by many images of the same entity

I Result: One vector per class, with millions of values

Result: One vector with millions of numerical values for eachabstract class

Dario Garcia-Gasulla June 1, 2016 27 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Step 3: Exploit vectors

Vector operationsI Compute distances to perform clustering

(unsupervised learning)

I Visualize class vectors(see what the CNN sees)

I Vector arithmetics(visual reasoning)

Dario Garcia-Gasulla June 1, 2016 28 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Mining process: Step by Step

1. Build million-pattern description for a set ofimages

2. Aggregate images by class

3. Compute distances, clusters, arithmetics

I Image to vector

I Image Class tovector

I Image Classclustering andequations

Dario Garcia-Gasulla June 1, 2016 29 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Actual Data

The ModelI GoogLeNet architecture pretrained to recognize 1,000

classes using 1.5M images. 80MB.

I Extract 1.2M features from the CNN. One vector < 3MB

The DataI Process 50,000 images (ImageNet test set)

I Aggregate 50 images per class: 1,000 class vectors

Dario Garcia-Gasulla June 1, 2016 30 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Clustering (I)

114 Dogs (black)44 Wheeled vehicles (grey) ?

I Similar things are close I Implicit high level knowledgeDario Garcia-Gasulla June 1, 2016 31 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Clustering (II)

Which semantics does the vector space actually capture?I Find n-clustersI For each cluster, find their most representative

WordNet label

Dario Garcia-Gasulla June 1, 2016 32 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Class Visualization (I)

Vector to imageI Generate images from ClassVectorsI See a concept as the CNN percieves it

Based on Gatys et.al., 2015

Dario Garcia-Gasulla June 1, 2016 33 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Class Visualization (II)

Dario Garcia-Gasulla June 1, 2016 34 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Vector Arithmetics (I)

Church - Mosque = Bellcote

Horse cart - Horse = Rickshaw

Dario Garcia-Gasulla June 1, 2016 35 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Vector Arithmetics (III)

Panda bear - Brown bear = Skunk, Football, Indri, Angora rabbit

I What do these four image classes have in common?

Dario Garcia-Gasulla June 1, 2016 36 / 38

Deep Learning basics Convolutional Neural Networks CNN Applications Mining CNNs Summary

Deep Learning and CNN

TechnicallyI Not so new

I Made possible by increase computational power, and fewoptimizations

I Currently, trial and error research approach

ImpactI Anything related with visual data has changedI Same will happen with text, sound and othersI Just the tip of the iceberg!

Dario Garcia-Gasulla June 1, 2016 37 / 38

Deep Learning and CNNs online materialshttp://cs231n.github.io/convolutional-networks/

http://ufldl.stanford.edu/tutorial/

Almost out-of-the-box CNNsCaffe, Torch, Theano, TensorFlow

[email protected]