Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Grand Rehearsal FEV 2018
Prof. Bart ter Haar Romeny
The relation between biological vision and computer vision
The beginning of Artificial Intelligence Campbell’s Soup Factories, USA
1984: Aldo Camino, expert with46 year experience, kneweverything of the complexe 22 m high sterilisers, heating68.000 cans of soup to 120 degrees.
He retired: all his knowledge in hundreds of RULES.
Dead end street!
Classical Computer-Aided Diagnosis / Detectionwith hand-crafted features
Gabor feature filters Clusters in feature space Validation, ROC curve
Classical machine learning:
‘Spurious resolution’: artefact of the wrong aperture
What is the best aperture?
Cortical receptive fields are well modeled by
derivatives of the Gaussian kernel
(Koenderink 1984)
( )xyxGyxLyxGyxL
x ∂∂
⊗=⊗∂∂ );,(),(;,(),( σσ
* =
gradient
We model the synaptic connectionswith artificial neural networks
A 3-layer neural network
However, these networks only gave 75% correct…
THE MODEL
Learning: synapses get bigger
Hierarchical learning as the brain
Needed:
• Large sets of training data• Clever network architecture• Error backpropagation• Robust classifier
The revolution: mimick the visual cascade:Deep Learning with neural nets of many layers
The idea: context
What a local filter sees:What a context filter sees:
In
Deep Learning Convolutional Neural Networks
THE TRICK: incrementalcontextual structure analysis
Convolution, ReLU, max pooling, convolution, convolution etc.
Error backpropagation AlexNet(Alex Krizhevsky2012)
ImageNetchallenge:1.4 millionimages,1000 classes
75% → 94%
A typical big deep NN has (hundreds of) millions of connections: weights.
Nvidia blog examples Medium.com
Google TensorFlow Kaggle.com DR
Google AutoML Kaggle Competitions
Ramen Jiro (ラーメン二郎) prediction from 41 ramen shops in TokyoKenji collected 1170 photos x 41 shops = 48,000 photos of ramen with shop labels.
AutoML Vision achieved 94.5% accuracy. AutoML did all work, preprocessing, augmentation, training. The whole process is designed for non data scientists, does not require ML expertise.
It is becoming easierand easier …
More applicationsevery day …
• > 80% of papers: Deep Learning• Challenges with given data is the norm
Data augmentation:Make MANY more new images from a single image by tiny transformations.
AI News Anchor
https://blogs.nvidia.com/https://medium.com/topic/artificial-intelligencehttps://www.tensorflow.org/https://www.kaggle.com/c/diabetic-retinopathy-detectionhttps://cloud.google.com/blog/big-data/2018/03/automl-vision-in-action-from-ramen-to-branded-goodshttps://www.kaggle.com/competitionshttps://medium.com/mlmemoirs/worlds-first-ai-news-anchor-makes-its-debut-in-china-4ffc00716578
The term "deep learning" refers to the method of training multi-layered neural networks, and became popular after papers by Geoffrey Hintonand his co-workers which showed a fast way to train such networks.
Yann LeCun, a student of Geoff Hinton, also developed a very effective algorithm for deep learning, called ConvNet, which was successfully used in late 80-s and early 90-s for automatic reading of amounts on bank checks.
In May 2014, Baidu, the Chinese search giant, has hired Andrew Ng, a leading Machine Learning and Deep Learning expert (and co-founder of Coursera) to head their new AI Lab in Silicon Valley, setting up an AI & Deep Learning race with Google (which hired Geoffrey Hinton) and Facebook (which hired Yann LeCun to head Facebook AI Lab).
Weigths and Activation functions
Demo
𝒉𝒉 = 𝝈𝝈(𝐖𝐖𝟏𝟏𝒙𝒙 + 𝒃𝒃𝟏𝟏)
𝒚𝒚 = 𝝈𝝈(𝑾𝑾𝟐𝟐𝒉𝒉 + 𝒃𝒃𝟐𝟐)
𝒉𝒉
𝒚𝒚
𝒙𝒙4 + 2 = 6 neurons (not counting inputs)
[3 x 4] + [4 x 2] = 20 weights 4 + 2 = 6 biases
26 learnable parameters
Weights
Activation functions
http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle®Dataset=reg-plane&learningRate=0.03®ularizationRate=0&noise=0&networkShape=4,2&seed=0.45430&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false
Loss functions and outputClassification Regression
Training examples
Rn x {class_1, ..., class_n} (one-hot encoding)
Rn x Rm
Output Layer
Soft-max[map Rn to a probability distribution]
Linear (Identity) or Sigmoid
Cost (loss)function
Cross-entropy
Mean Squared Errorf(x)=x
List of loss functions
𝐽𝐽 𝜃𝜃 = −1𝑛𝑛�𝑖𝑖=1
𝑛𝑛
�𝑘𝑘=1
𝐾𝐾
𝑦𝑦𝑘𝑘(𝑖𝑖) log �𝑦𝑦𝑘𝑘
(𝑖𝑖) + 1 − 𝑦𝑦𝑘𝑘(𝑖𝑖) log 1 − �𝑦𝑦𝑘𝑘
𝑖𝑖
𝐽𝐽 𝜃𝜃 =1𝑛𝑛�𝑖𝑖=1
𝑛𝑛
𝑦𝑦(𝑖𝑖) − �𝑦𝑦(𝑖𝑖) 2
𝐽𝐽 𝜃𝜃 =1𝑛𝑛�𝑖𝑖=1
𝑛𝑛
𝑦𝑦(𝑖𝑖) − �𝑦𝑦(𝑖𝑖)
Mean Absolute Error
Classification is about predicting a label and regression is about predicting a quantity.
Example: digit classificationExample: house prices
https://isaacchanghau.github.io/post/loss_functions/
TrainingSample labeled
data(batch)
Forward it through the network, get predictions
Back-propagate
the errors
Update the network weights
Optimize (min. or max.) an objective/cost function 𝑱𝑱(𝜽𝜽).Generate an error signal that measures the difference between predictions and target values.
Use the error signal to change the weights and get more accurate predictions.Subtracting a fraction of the gradient moves you towards the (local) minimum of the cost function.
https://medium.com/@ramrajchandradevan/the-evolution-of-gradient-descend-optimization-algorithm-4106a6702d39
In
Deep Learning Convolutional Neural Networks
Convolution, ReLU, max pooling, convolution, convolution etc.
Error backpropagation AlexNet(Alex Krizhevsky2012)
ImageNetchallenge:1.4 millionimages,1000 classes
75% → 94%
A typical big deep NN has (hundreds of) millions of connections: weights.
How does this actually work?
Gradient descent, derivatives, chain rule:
With
etc.
And finally:
1
And iterate until convergence:
Convolutional Neural Networks (CNNs)
Input matrix
Convolutional 3x3 filter
Convolution = filtering with a kernel / template / receptive field
To keep the same output size:Boundary choice (all wrong!):• Zero padding• Mean padding• Reflection
The convolution integral:
Wikipedia: convolution, cross-correlation
The cross-correlation integral:
The convolution theorem states that the Fourier transform of a convolution is the pointwise product of Fourier transforms.
https://en.wikipedia.org/wiki/Convolutionhttps://en.wikipedia.org/wiki/Cross-correlation
Physiology of
Front-End Vision
Prof. Bart ter Haar Romeny, PhDBiomedical Image Analysis
Eindhoven University of Technology
Synapses grow in size with learning:~ increasing weights
Rod and cone disks
Some numbers:
Disk 10 nm thickDisk spacing 25 nmOuter segment 25 μm1000 disks/rod108 rhodopsine mol./rod108 rod cells/retina
From: book.bionumbers.org/how-big-is-a-photoreceptor/
Cones couple to on-center andto off-centerganglion cells
The discovery of receptive fields by Hubel and Wiesel(Nobelprize 1981)
David Hubel Torsten Wiesel
50% on-center-surround RF 50% off-center-surround RF
Classical explanation:• Lateral inhibition• Surround suppression• Equal speed for on- and off intensity
Two types of retinal ganglion cells:Midget: small, for shapeParasol: large, for
motion
The retina is a multi-scale sampling device:
Multi-scale range
shape
motion
Disappearingblack spotsdue to lowacuity at highereccentricity
We only see sharp in the fovea
The mapping of many types of fully tiling ganglion cells is coarse and overlapping
Masland 2012
Reichardt motion detector: Two RF’s separated by delayIn the visual front-end retinal receptive fields are organized in pairs, tuned to a specific velocity and direction. The pairs are coupled by a delay cell, possibly a specific type amacrine cell.
Neurons act as temporalcoincidence detectors →Tuned velocity detector
All velocities and directionsare measured at all scales.
Motion illusion: flying bird
Time = span/velocity = delay
http://www.michaelbach.de/ot/cog-hiddenBird/index.html
Two slightly shifted RFs in different eyes for disparity detectionDisparity - stereo for depth perception
In the visual cortex V1 ‘far cells’ and ‘near cells’ are recorded.
Summary:
• The retina is a multi-scale sampling device• The retina measures with at least 20 overlapping ganglion RF tilings• Acuity decreases linearly with eccentricity• 150 million receptors converge into 1 million fibers in the optic nerve• Amacrine cells play a key role in directional motion detection• Separate channels to the LGN exist for shape, motion and color
Suggested reading:• R. Masland, The neuronal organization of the retina, Neuron 76, 2012• Hubel, David H. Eye, Brain and Vision. Scientific American Books, 1995.• H. Kolb et al. Webvision, https://webvision.med.utah.edu/• H. Kandel et al. Principles of Neural Science, New York, McGraw-hill 2013• R.W. Rodieck, The First Steps in Seeing, Sinauer Associates, 1998
https://webvision.med.utah.edu/
Research questions
1. Why do we have center-surround receptive fields in the retina?2. Why do we have on- and off channels?3. Why do we have 150 million rods and cones, and only 1 million fibers
in the optic nerve?4. Why can neurons fire so slowly, while our computers need GigaHertz
operations?5. Why do we have ~20 retinal channels sampling the outer world image
(Masland 2012)?6. Why do cones have a cone shape?7. Why do we make such precise binocular micro-saccades?8. Why do we have pinwheel orientation structures in the visual cortex?9. What is the visual field size of a cortical (pinwheel) hypercolumn?
FEV
Central Visual Pathways
David HubelNobelprize 1981
Torsten WieselNobelprize 1981
FEV
FEV
The 6 layers of the LGN:4 parvo-cellular layers
(small cells)2 magno-cellular layers
(large cells)
parvo
parvo
parvo
parvo
magno
magno
L
L
LR
R
R
Motion channel
http://www.michaelbach.de/ot/cog-hiddenBird/index.html
FEV
The receptive fields of LGN cells all have aon- or off-center-surround sensitivity profile
FEV
Spatio-temporal receptive field
mapping by reverse
correlation by Ohzawa and
Freeman(UC Berkeley)
Stimulus
RF profile
Temporal RF
Reverse Correlation Technique
FEV
FEV
Three main types ofreceptive fieldsensitivity profiles
FEV
50
50
Time sequence simple cell RF,
separableDeAngelis, Ohzawa and Freeman, TINS
1995
Reverse Correlation Technique
Voltage sensitive dye opticalimaging of tree shrew cortex
Brain-inspired image analysis: multi-orientation(大脑启发算法 – 多方向分析)Neuro-mathematics: multi-orientation analysis by the cortex
pinwheel
Cortical hypercolumn0.3 x 0.3 mm
Color orientation coding
FEV
The technique of Voltage Sensitive Dyes for the measurement of neural population signals was pioneered by prof. AmiramGrinvald, Weizmann Institute, Israel.
Voltage Sensitive Dyes
https://en.wikipedia.org/wiki/Voltage-sensitive_dyehttp://www.weizmann.ac.il/brain/grinvald/
FEV
Optical dye response at different orientations(monkey V1): the discovery of cortical hypercolumns (1991).
From Bonhoeffer and Grinvald, Nature 353, 429-431, 1991
Voltage Sensitive Dyes
FEV
Fitzpatrick, Duke University, Nature 2002
Connections exist between similar orientationsto far away columns
Alexander & van Leeuwen, 2010
FEV
image rotatingkernelorientation space
What are proper kernels that allow an inverse orientation transform?(no data loss)
Fourier Transform /Inverse Fourier Transform:
Sin / CosOrientation Space 2D:
Cake kernels, a new wavelet family
Orientation Space 3D:Mathieu functions
Exactly invertibleorientation transform:
0;)(),(
,)(),(
2
2
≥∂−=Φ
∂−=Φ
−
−
−
neaz
eazzz
nnn
zzn
nn
σ
σ
σσ
σσ
Multi-orientation differential geometry
FEV
Gabor vs Cake Kernel – Fourier Domain
Gabor Kernel
Cake Kernel
FEV
Different orientations are disentangled in the orientation space
imageorientation score
image orientation score
rotatingkernel filtered image
FEVFranken, Duits, ter Haar Romeny, TU/e, 2010
Denoising of crossing fibers(collagen, tissue engineered heart valve)
Properties of the Gaussian kernel
• Cascade property, Gaussian convolved with Gaussian is Gaussian• Normalization, area = 1• Separable• Relation to binomial coefficients• Relation to generalized functions (Dirac, Heavyside)• Fourier transform of Gaussian is also Gaussian• Low-pass filter• Narrow kernel in spatial domain is wide kernel is Fourier domain• Solution of the diffusion equation
Properties of the Gaussian derivative kernels
• Gaussian derivative – Gaussian times Hermite polynomial• Bandwidth filter
Differential structure of images
Gauge invariants are made with intrinsic coordinates v and w.Every derivative with respect to v and/or w is orthogonal invariant
Notebook
Some examples:
Second order structure
Affine invariant corner detection
Affine invariant corner detector:
Third order structure:
Change of isophote curvature at a T-junction →
Use: 3D TV from 2D video
Deep Learning withConvolutional Neural Networks
Hierarchical learning as the brain
Needed:
• Large sets of training data• Clever network architecture• Error backpropagation• Robust classifier
The first filters must represent the incoming data as efficientas possible → represented in a compact basis
Mapping of spatiotemporal receptive fields V1 of the tree shrew.
The negative subfield of the kernel is located centrally, indicating a preponderance of symmetric second order kernels in the selected RFs.
Calcium intrinsic imaging combined with reversecorrelation of responses to a sparse noise stimulus.
From:K. S. Lee, X. Huang, and D. Fitzpatrick. Topology of ON and OFF inputs in visual cortex enables an invariant columnar architecture. Nature, 533(7601):90{94, 5 2016
Principal Component Analysis (PCA) finds the intrinsic orthogonal local coordinate frame in the data as the orthogonal eigenvectors of the covariance matrix.
Covariance matrix (from Wikipedia):
Math: Learn from image patches → simple filters, edges, linesPrincipal Component Analysis
If data are restricted:filters are restricted
Lesson:
Handcrafted filters
Filters are in the DATA
• Multi-scale derivatives• Lie group: Infinitesimal
generator of translation• Taylor expansion
https://www.youtube.com/watch?v=QzkMo45pcUo
Colin Blakemore’s famous experiment with visual derivation (1974)
Blakemore, Colin, and Grahame F. Cooper. “Development of the brain depends on the visual environment.” (1970): Nature, 228(5270), 477-478.
Blakemore’s cat: First three months after birth –it sees only horizontal stripes
After three months: it could see a horizontal stick, but NOT a vertical one.Lesson: it had never learned filters for vertical lines, they were not in the data.
Bev Doolittle: The forest has eyes
The challenge
Computer-AidedDiagnosis
Bev Doolittle: The forest has eyes
We have much moredifficulty in recognizingfaces upside-down.
Slide Number 1Slide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Physiology �of �Front-End VisionSlide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Slide Number 27Slide Number 28Slide Number 29Slide Number 30Slide Number 31Slide Number 32Slide Number 33Slide Number 34Slide Number 35Slide Number 36Slide Number 37Slide Number 38Slide Number 39Slide Number 40Slide Number 41Slide Number 42Slide Number 43Slide Number 44Slide Number 45Slide Number 46Slide Number 47Gabor vs Cake Kernel – Fourier DomainSlide Number 49Slide Number 50Slide Number 51Slide Number 52Slide Number 53Slide Number 54Slide Number 55Slide Number 56Slide Number 57Slide Number 58Slide Number 59Slide Number 60Slide Number 61Slide Number 62Slide Number 63Slide Number 64Slide Number 65Slide Number 66Slide Number 67