Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

  • Upload

  • View

  • Download

Embed Size (px)

Citation preview

Page 1: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

Computer Vision

From traditional approaches to deep neural networks

Stanislav Frolov München, 27.02.2018

Page 2: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Computer vision● Human vision● Traditional approaches and methods● Artificial neural networks● Summary


Outline of this talkWhat we are going to talk about

Page 3: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● trained deep neural networks for object detection during master thesis

● still fascinated and interested


Stanislav Frolov

Big Data Engineer @inovex

Page 4: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Teach computers how to see● Automatic extraction, analysis and understanding of

images● Infer useful information, interpret and make decisions● Automate tasks that human visual system can do● One of the most exciting fields in AI and ML


What is computer visionGeneral

Page 5: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionMotivation

● Era of pixels● Internet consists

mostly of images● Explosion of visual

data● Cannot be labeled

by humans

Page 6: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionDrivers

● Two drivers for computer vision explosion○ Compute (faster and cheaper)○ Data (more data > algorithms)

Page 7: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionInterdisciplinary field

Computer Science





Information Retrieval

Machine LearningGraphs,


Systems Architecture


Speech, NLP

Image Processing

OpticsSolid-State Physics


Cognitive SciencesBiological vision

Page 8: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018



Page 9: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Imaging for statistical pattern recognition● Image transformations such as pixel-by-pixel operations

○ Contrast enhancement○ Edge extraction○ Noise reduction○ Geometrical and spatial operations (i.e rotations)


What is computer visionRelated fields - image processing

Page 10: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Creates new images from scene descriptions● Produces image data from 3D models● “Inverse” of computer vision● AR as a combination of both


What is computer visionRelated fields - computer graphics

Page 11: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Mainly manufacturing applications● Image-based automatic inspection, process control,

robot guidance● Usually employs strong assumptions (colour, shape,

light, structure, orientation, ...) -> works very well● Output often pass/fail or good/bad● Additionally numerical/measurement data, counts


What is computer visionRelated fields - machine vision

Page 12: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Create “intelligent” systems● Studying computational aspects of intelligence● Make computers do things at which, at the moment,

people are better● Many techniques play an important role (ML, ANNs)● Currently does a few things better/faster at scale than

humans can● Ability to do anything “human” is not answered


What is computer visionRelated fields - AI

Page 13: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Related fields have a large intersection● Basic techniques used, developed and studied are very



What is computer visionRelated fields- summary

Page 14: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

Short trip to human vision


Page 15: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Two stage process○ Eyes take in light reflected off the objects and retina

converts 3D objects into 2D images○ Brain’s visual system interprets 2D images and “rebuilds”

a 3D model


What is human visionGeneral

Page 16: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Pair of 2D images with slightly different view allows to infer depth

● Position of nearby objects will vary more across the two images than the position of more distant objects


What is human visionStereoscopic vision

Page 17: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Prior knowledge of relative sizes and depths is often key for understanding and interpretation


What is human visionPrior knowledge

Page 18: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Texture and texture change helps solving depth perception


What is human visionTexture pattern

Page 19: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is human visionBiases and illusions in human perception

● Shadows make all the difference in interpretation● Gradual changes in light ignored to not be misled by


Page 20: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is human visionA few more illusions

● Two arrows with different orientations have the same length

Page 21: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Assumptions and familiarity (distorted room)● Face recognition bias● Up-down orientation bias


What is human visionBiases and illusions in human perception

Page 22: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is human visionSummary

● Illusions are fun, but the complete puzzle to understand human vision is far from being complete

Page 23: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

Back to computer vision


Page 24: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Recognition● Localization● Detection● Segmentation


What is computer visionTypical tasks

Page 25: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Part-based detection○ Deformable parts model○ Pose estimation and poselets


What is computer visionTypical tasks

Page 26: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Image captioning (actions, attributes)


What is computer visionTypical tasks

Page 27: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Motion analysis○ Egomotion (camera)○ Optical flow (pixels)


What is computer visionTypical tasks

Page 28: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Scene understanding and reconstruction


What is computer visionTypical tasks

Page 29: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Image restoration● Colouring black & white photos


What is computer visionTypical tasks

Page 30: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

Solving this is useful for many applications


Page 31: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionTypical applications

● Assistance systems for cars and people● Surveillance● Navigation (obstacle avoidance, road following, path

planning)● Photo interpretation● Military (“smart” weapons)● Manufacturing (inspection, identification)● Robotics● Autonomous vehicles (dangerous zones)

Page 32: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionTypical applications

● Recognition and tracking● Event detection● Interaction (man-machine interfaces)● Modeling (medical, manufacturing, training, education)● Organizing (database index, sorting/clustering)● Fingerprint and biometrics● …

Page 33: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

Why so difficult?


Page 34: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionWhy it is difficult

● Occlusion● Deformation● Scale● Clutter● Illumination● Viewpoint● Object pose

● Tons of classes and variants

● Often n:1 mapping● Computationally

expensive● Full understanding of

biological vision is missing

Page 35: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

System overview


Page 36: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Input: image(s) + labels● Output: Semantic data, labels

● Digital image pixels usually have three channels [R,G,B] each [0...255] + Location[x,y]

● Digital images are just vectors


What is computer visionSystem overview

Page 37: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

1. Image acquisition (camera, sensors)2. Pre-processing (sampling, noise reduction,

augmentation)3. Feature extraction (lines, edges, regions, points)4. Detection and segmentation5. Post-processing (verification, estimation, recognition)6. Decision making● -> Ability of a machine to step back and interpret the big

picture of those pixels37

What is computer visionSystem overview

Page 38: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

Some history


Page 39: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


● 2D imaging for statistical pattern recognition● Theory of optical flow based on a fixed point

towards which one moves


What is computer visionHistory

Page 40: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

Image processing

● Histograms● Filtering● Stitching● Thresholding● ...


What is computer visionTraditional approaches

Page 41: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


● Desire to extract 3D structure from 2D images for scene understanding

● Began at pioneering AI universities to mimic human visual system as stepping stone for intelligent robots

● Summer vision project at MIT: attach camera to computer and having it “describe what it saw”


What is computer visionHistory

Page 42: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Given to 10 undergraduate students● … an attempt to use our summer workers effectively … ● … construction of a significant part of a visual system … ● … task can be segmented into sub-problems … ● … participate in the construction of a system complex

enough to be a real landmark in the development of “pattern recognition” …


What is computer visionHistory: summer vision project @MIT 1966

Page 43: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Goal: analyse scenes and identify objects● Structure of system:

○ Region proposal○ Property lists for regions○ Boundary construction○ Match with properties○ Segment

● Basic foreground/background segmentation with simple objects (cubes, cylinders, ….)


What is computer visionHistory: summer vision project @MIT 1966

Page 44: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Unlike general intelligence, computer vision seemed tractable

● Amusing anecdote, but it did never aimed to “solve” computer vision

● Computer vision today differs from what it was thought to be in 1966


What is computer visionHistory: summer vision project @MIT 1966

Page 45: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


● Formed many algorithms that exist today● Edges, lines and objects as interconnected



What is computer visionHistory

Page 46: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionTraditional approaches

Edge detection based on

● Brightness● Gradients● Geometry● Illumination

Page 47: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionTraditional approaches - part based detector

● Objects composed of features of parts and their spatial relationship

● Challenge: how to define and combine

Page 48: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


● More rigorous mathematical analysis and quantitative aspects

● Optical character recognition● Sliding window approaches● Usage of artificial neural networks


What is computer visionHistory

Page 49: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionTraditional approaches - HOG detection (histogram of oriented gradients)

● Concept in 80s but used only in 2005● Create HOG descriptors (object generalizations)● One feature vector per object● Train with SVM● Sliding window @multiple scales

Page 50: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionTraditional approaches - HOG detection (histogram of oriented gradients)

● Computation of HOG descriptors:

1. Compute gradients2. Compute histograms on cells3. Normalize histograms4. Concatenate histograms

● Requires a lot of engineering● Must build ensembles of feature descriptors

Page 51: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


● Significant interaction with computer graphics (rendering, morphing, stitching)

● Approaches using statistical learning● Eigenface (Ghostfaces) through principal component

analysis (PCA)


What is computer visionHistory

Page 52: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionTraditional approaches - deformable parts model (DPM)

● Objects constructed by its parts● First match whole object, then refine on the parts● HOG + part-based + modern features ● Slow but good at difficult objects● Involves many heuristics

Page 53: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionFeatures

● Feature points○ Small area of pixels with certain properties

● Feature detection○ Use features for identification○ Activate if “object” present

● Examples:○ Lines, edges, colours, blobs, …○ Animals, faces, cars, ...

Page 54: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionTraditional approaches - classical recognition

● Init: extract features for objects in different scales, colours, orientations, rotations, occlusion levels

● Inference: extract features from query image and find closest match in database or train a classifier

● Computationally expensive (hundreds of features in image, millions in database) and complex due to errors and mismatches

Page 55: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


What is computer visionHistory

Before the new era

● Bags of features● Handcrafted ensembles

Input Feat. 2

Feat. 1

Feat. n


Feature Extraction

Page 56: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

The new era of computer vision


Page 57: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Elementary building block

● Inspired by biological neurons

● Mathematical function y=f(wx+b)

● Learnable weights


Artificial neural networksFundamentals - artificial neuron

Page 58: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Collection of neurons organized in layers

● Universal approximators

● Fully-connected network here


Artificial neural networksFundamentals - artificial neural networks

Page 59: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


Artificial neural networksFundamentals - training

● Basically an optimization problem

● Find minimum of a loss function by an iterative process (training)

● Designing the loss function is sometimes tricky

Page 60: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


Artificial neural networksFundamentals - training

Simple optimizer algorithm:

1. Forward pass with a batch of data2. Calculate error between actual and wanted output3. Nudge weights in proportion to error into the right

direction (same data would result in smaller error)4. Repeat until convergence

Page 61: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


Artificial neural networksFundamentals - CNN

● Local neighborhood contributes to activation

● Exploit spatial information

● Hierarchical feature extractors

● Less parameters input



receptive field

Page 62: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


Artificial neural networksFundamentals - CNN

● Filter of size 3x3 applied to an input of 7x7

Page 63: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


Artificial neural networksFundamentals - pooling

● Max-pooling● Dimension reduction/adaption● Existence is more important than location

Page 64: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


Artificial neural networksFundamentals - pooling

● Zero-padding● Controlling dimensions

Page 65: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


Artificial neural networksFundamentals - general network architecture

Input image

convolutional layers

... Final decision

Page 66: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018


Artificial neural networksFundamentals - hierarchical feature extractors

Lines, edges, blobs, colours, ...

Abstract objectsParts of abstract objects

First layers Deeper layers

Activations for:

Page 67: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

Modern history of object recognition


Page 68: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Classification and detection○ 27k images○ 20 classes

■ person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/ monitor


BenchmarkDatasets - PASCAL VOC

Page 69: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Challenges on a subset of ImageNet○ 14kk labeled images○ 20k object categories

● ILSVRC* usually on 10k categories including 90 out of 120 dog breeds


BenchmarkDatasets - ImageNet

*ImageNet Large Scale Visual Recognition Challenge

Page 70: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● ILSVRC 2012 winner by a large margin from 25% to 16%● Proved effectiveness of CNNs and kicked of a new era● 8 layers, 650k neurons, 60kk parameters


Artificial neural networksRoadmap - AlexNet

Page 71: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● ILSVRC 2013 winner with a best top-5 error of 11.6%● AlexNet but using smaller 7x7 kernels to keep more

information in deeper layers


Artificial neural networksRoadmap - ZFNet

Page 72: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● ILSVRC 2013 localization winner● Uses AlexNet on multi-scale input images with sliding

window approach● Accumulates bounding boxes for final detection (instead

of non-max suppression)


Artificial neural networksRoadmap - OverFeat

Page 73: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● 2k proposals generated by selective search● SVM trained for classification● Multi-stage pipeline


Artificial neural networksRoadmap - RCNN (region based CNN)

Page 74: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Not a winner but famous due to simplicity and effectiveness

● Replace large-kernel convolutions by stacking several small-kernel convolutions


Artificial neural networksRoadmap - VGGNet

Page 75: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● ILSVRC 2014 winner● Stacks up “inception” modules● 22 layers, 5kk parameters


Artificial neural networksRoadmap - InceptionNet (GoogleNet)

Page 76: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Jointly learns region proposal and detection● Employs a region of interest (RoI) that allows to reuse

the computations


Artificial neural networksRoadmap - Fast RCNN

Page 77: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Directly predicts all objects and classes in one shot● Very fast● Processes images at ~40 FPS on a Titan X GPU● First real-time state-of-the-art detector● Divides input images into multiple grid cells which are

then classified


Artificial neural networksRoadmap - YOLO (you only look once)

Page 78: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● ILSVRC 2015 winner with a 3.6% error rate (human performance is 5-10%)

● Employs residual blocks which allows to build deep networks (hundreds of layers)

● Additional identity mapping


Artificial neural networksRoadmap - ResNet (Microsoft)

Page 79: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Not a recognition network● A region proposal network● Popularized prior/anchor boxes (found through

clustering) to predict offsets● Much better strategy than starting the predictions with

random coordinates● Since then heuristic approaches have been gradually

fading out and replaced


Artificial neural networksRoadmap - MultiBox

Page 80: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Fast RCNN with heuristic region proposal replaced by region proposal network (RPN) inspired by MultiBox

● RPN shares full-image convolutional features with the detection network (cost-free region proposal)

● RPN uses “attention” mechanism to tell where to look● ~5 FPS on a Titan K40 GPU● End-to-end training


Artificial neural networksRoadmap - Faster RCNN

Page 81: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● SSD leverages the Faster RCNN’s RPN to directly classify objects inside each prior box (similar to YOLO)

● Predicts category scores and box offsets for a fixed set of default bounding boxes

● Fixes the predefined grid cells used in YOLO by using multiple aspect ratios

● Produces predictions of different scales● ~59 FPS


Artificial neural networksRoadmap - SSD (single shot multibox detector)

Page 82: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Open-source software library for machine learning applications

● Tensorflow Object Detection API○ A collection of pretrained models○ construct, train and deploy object detection models


Artificial neural networksTensorFlow object detection API

Page 83: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018



Page 84: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Humans are good at understanding the big picture● Neural networks are good at details● But they can be fooled...


SummaryHuman vs machine

Page 85: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Need a large amount data● Lots of engineering● Trial and error● Long training time● Still lots of hyperparameter parameter tuning● No general network (generalization not answered)● Little mathematical foundation


SummaryComputer vision is still difficult

Page 86: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

● Despite all of these advances, the dream of having a computer interpret an image at the same level as a human remains unrealized


SummaryComputer vision is hard

Page 87: Computer Vision...Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018

Thank You

Stanislav Frolov

Big Data Engineer

[email protected]

0173 318 11 35

inovex GmbH

Lindberghstraße 3

80939 München