67
Grand Rehearsal FEV 2018 Prof. Bart ter Haar Romeny The relation between biological vision and computer vision

Grand Rehearsal FEV 2018 · 2018-12-13 · and his co-workers which showed a fast way to train such networks. Yann LeCun, a student of Geoff Hinton, also developed a very effective

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Grand Rehearsal FEV 2018

    Prof. Bart ter Haar Romeny

    The relation between biological vision and computer vision

  • The beginning of Artificial Intelligence Campbell’s Soup Factories, USA

    1984: Aldo Camino, expert with46 year experience, kneweverything of the complexe 22 m high sterilisers, heating68.000 cans of soup to 120 degrees.

    He retired: all his knowledge in hundreds of RULES.

    Dead end street!

  • Classical Computer-Aided Diagnosis / Detectionwith hand-crafted features

    Gabor feature filters Clusters in feature space Validation, ROC curve

    Classical machine learning:

  • ‘Spurious resolution’: artefact of the wrong aperture

    What is the best aperture?

  • Cortical receptive fields are well modeled by

    derivatives of the Gaussian kernel

    (Koenderink 1984)

    ( )xyxGyxLyxGyxL

    x ∂∂

    ⊗=⊗∂∂ );,(),(;,(),( σσ

    * =

    gradient

  • We model the synaptic connectionswith artificial neural networks

    A 3-layer neural network

    However, these networks only gave 75% correct…

    THE MODEL

    Learning: synapses get bigger

  • Hierarchical learning as the brain

    Needed:

    • Large sets of training data• Clever network architecture• Error backpropagation• Robust classifier

    The revolution: mimick the visual cascade:Deep Learning with neural nets of many layers

  • The idea: context

    What a local filter sees:What a context filter sees:

  • In

    Deep Learning Convolutional Neural Networks

    THE TRICK: incrementalcontextual structure analysis

    Convolution, ReLU, max pooling, convolution, convolution etc.

    Error backpropagation AlexNet(Alex Krizhevsky2012)

    ImageNetchallenge:1.4 millionimages,1000 classes

    75% → 94%

    A typical big deep NN has (hundreds of) millions of connections: weights.

  • Nvidia blog examples Medium.com

    Google TensorFlow Kaggle.com DR

    Google AutoML Kaggle Competitions

    Ramen Jiro (ラーメン二郎) prediction from 41 ramen shops in TokyoKenji collected 1170 photos x 41 shops = 48,000 photos of ramen with shop labels.

    AutoML Vision achieved 94.5% accuracy. AutoML did all work, preprocessing, augmentation, training. The whole process is designed for non data scientists, does not require ML expertise.

    It is becoming easierand easier …

    More applicationsevery day …

    • > 80% of papers: Deep Learning• Challenges with given data is the norm

    Data augmentation:Make MANY more new images from a single image by tiny transformations.

    AI News Anchor

    https://blogs.nvidia.com/https://medium.com/topic/artificial-intelligencehttps://www.tensorflow.org/https://www.kaggle.com/c/diabetic-retinopathy-detectionhttps://cloud.google.com/blog/big-data/2018/03/automl-vision-in-action-from-ramen-to-branded-goodshttps://www.kaggle.com/competitionshttps://medium.com/mlmemoirs/worlds-first-ai-news-anchor-makes-its-debut-in-china-4ffc00716578

  • The term "deep learning" refers to the method of training multi-layered neural networks, and became popular after papers by Geoffrey Hintonand his co-workers which showed a fast way to train such networks.

    Yann LeCun, a student of Geoff Hinton, also developed a very effective algorithm for deep learning, called ConvNet, which was successfully used in late 80-s and early 90-s for automatic reading of amounts on bank checks.

    In May 2014, Baidu, the Chinese search giant, has hired Andrew Ng, a leading Machine Learning and Deep Learning expert (and co-founder of Coursera) to head their new AI Lab in Silicon Valley, setting up an AI & Deep Learning race with Google (which hired Geoffrey Hinton) and Facebook (which hired Yann LeCun to head Facebook AI Lab).

  • Weigths and Activation functions

    Demo

    𝒉𝒉 = 𝝈𝝈(𝐖𝐖𝟏𝟏𝒙𝒙 + 𝒃𝒃𝟏𝟏)

    𝒚𝒚 = 𝝈𝝈(𝑾𝑾𝟐𝟐𝒉𝒉 + 𝒃𝒃𝟐𝟐)

    𝒉𝒉

    𝒚𝒚

    𝒙𝒙4 + 2 = 6 neurons (not counting inputs)

    [3 x 4] + [4 x 2] = 20 weights 4 + 2 = 6 biases

    26 learnable parameters

    Weights

    Activation functions

    http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.45430&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false

  • Loss functions and outputClassification Regression

    Training examples

    Rn x {class_1, ..., class_n} (one-hot encoding)

    Rn x Rm

    Output Layer

    Soft-max[map Rn to a probability distribution]

    Linear (Identity) or Sigmoid

    Cost (loss)function

    Cross-entropy

    Mean Squared Errorf(x)=x

    List of loss functions

    𝐽𝐽 𝜃𝜃 = −1𝑛𝑛�𝑖𝑖=1

    𝑛𝑛

    �𝑘𝑘=1

    𝐾𝐾

    𝑦𝑦𝑘𝑘(𝑖𝑖) log �𝑦𝑦𝑘𝑘

    (𝑖𝑖) + 1 − 𝑦𝑦𝑘𝑘(𝑖𝑖) log 1 − �𝑦𝑦𝑘𝑘

    𝑖𝑖

    𝐽𝐽 𝜃𝜃 =1𝑛𝑛�𝑖𝑖=1

    𝑛𝑛

    𝑦𝑦(𝑖𝑖) − �𝑦𝑦(𝑖𝑖) 2

    𝐽𝐽 𝜃𝜃 =1𝑛𝑛�𝑖𝑖=1

    𝑛𝑛

    𝑦𝑦(𝑖𝑖) − �𝑦𝑦(𝑖𝑖)

    Mean Absolute Error

    Classification is about predicting a label and regression is about predicting a quantity.

    Example: digit classificationExample: house prices

    https://isaacchanghau.github.io/post/loss_functions/

  • TrainingSample labeled

    data(batch)

    Forward it through the network, get predictions

    Back-propagate

    the errors

    Update the network weights

    Optimize (min. or max.) an objective/cost function 𝑱𝑱(𝜽𝜽).Generate an error signal that measures the difference between predictions and target values.

    Use the error signal to change the weights and get more accurate predictions.Subtracting a fraction of the gradient moves you towards the (local) minimum of the cost function.

    https://medium.com/@ramrajchandradevan/the-evolution-of-gradient-descend-optimization-algorithm-4106a6702d39

  • In

    Deep Learning Convolutional Neural Networks

    Convolution, ReLU, max pooling, convolution, convolution etc.

    Error backpropagation AlexNet(Alex Krizhevsky2012)

    ImageNetchallenge:1.4 millionimages,1000 classes

    75% → 94%

    A typical big deep NN has (hundreds of) millions of connections: weights.

    How does this actually work?

  • Gradient descent, derivatives, chain rule:

    With

    etc.

    And finally:

    1

    And iterate until convergence:

  • Convolutional Neural Networks (CNNs)

    Input matrix

    Convolutional 3x3 filter

    Convolution = filtering with a kernel / template / receptive field

    To keep the same output size:Boundary choice (all wrong!):• Zero padding• Mean padding• Reflection

  • The convolution integral:

    Wikipedia: convolution, cross-correlation

    The cross-correlation integral:

    The convolution theorem states that the Fourier transform of a convolution is the pointwise product of Fourier transforms.

    https://en.wikipedia.org/wiki/Convolutionhttps://en.wikipedia.org/wiki/Cross-correlation

  • Physiology of

    Front-End Vision

    Prof. Bart ter Haar Romeny, PhDBiomedical Image Analysis

    Eindhoven University of Technology

  • Synapses grow in size with learning:~ increasing weights

  • Rod and cone disks

    Some numbers:

    Disk 10 nm thickDisk spacing 25 nmOuter segment 25 μm1000 disks/rod108 rhodopsine mol./rod108 rod cells/retina

    From: book.bionumbers.org/how-big-is-a-photoreceptor/

  • Cones couple to on-center andto off-centerganglion cells

  • The discovery of receptive fields by Hubel and Wiesel(Nobelprize 1981)

    David Hubel Torsten Wiesel

    50% on-center-surround RF 50% off-center-surround RF

    Classical explanation:• Lateral inhibition• Surround suppression• Equal speed for on- and off intensity

  • Two types of retinal ganglion cells:Midget: small, for shapeParasol: large, for

    motion

    The retina is a multi-scale sampling device:

    Multi-scale range

    shape

    motion

  • Disappearingblack spotsdue to lowacuity at highereccentricity

    We only see sharp in the fovea

  • The mapping of many types of fully tiling ganglion cells is coarse and overlapping

    Masland 2012

  • Reichardt motion detector: Two RF’s separated by delayIn the visual front-end retinal receptive fields are organized in pairs, tuned to a specific velocity and direction. The pairs are coupled by a delay cell, possibly a specific type amacrine cell.

    Neurons act as temporalcoincidence detectors →Tuned velocity detector

    All velocities and directionsare measured at all scales.

    Motion illusion: flying bird

    Time = span/velocity = delay

    http://www.michaelbach.de/ot/cog-hiddenBird/index.html

  • Two slightly shifted RFs in different eyes for disparity detectionDisparity - stereo for depth perception

    In the visual cortex V1 ‘far cells’ and ‘near cells’ are recorded.

  • Summary:

    • The retina is a multi-scale sampling device• The retina measures with at least 20 overlapping ganglion RF tilings• Acuity decreases linearly with eccentricity• 150 million receptors converge into 1 million fibers in the optic nerve• Amacrine cells play a key role in directional motion detection• Separate channels to the LGN exist for shape, motion and color

    Suggested reading:• R. Masland, The neuronal organization of the retina, Neuron 76, 2012• Hubel, David H. Eye, Brain and Vision. Scientific American Books, 1995.• H. Kolb et al. Webvision, https://webvision.med.utah.edu/• H. Kandel et al. Principles of Neural Science, New York, McGraw-hill 2013• R.W. Rodieck, The First Steps in Seeing, Sinauer Associates, 1998

    https://webvision.med.utah.edu/

  • Research questions

    1. Why do we have center-surround receptive fields in the retina?2. Why do we have on- and off channels?3. Why do we have 150 million rods and cones, and only 1 million fibers

    in the optic nerve?4. Why can neurons fire so slowly, while our computers need GigaHertz

    operations?5. Why do we have ~20 retinal channels sampling the outer world image

    (Masland 2012)?6. Why do cones have a cone shape?7. Why do we make such precise binocular micro-saccades?8. Why do we have pinwheel orientation structures in the visual cortex?9. What is the visual field size of a cortical (pinwheel) hypercolumn?

  • FEV

    Central Visual Pathways

    David HubelNobelprize 1981

    Torsten WieselNobelprize 1981

  • FEV

  • FEV

    The 6 layers of the LGN:4 parvo-cellular layers

    (small cells)2 magno-cellular layers

    (large cells)

    parvo

    parvo

    parvo

    parvo

    magno

    magno

    L

    L

    LR

    R

    R

    Motion channel

    http://www.michaelbach.de/ot/cog-hiddenBird/index.html

  • FEV

    The receptive fields of LGN cells all have aon- or off-center-surround sensitivity profile

  • FEV

    Spatio-temporal receptive field

    mapping by reverse

    correlation by Ohzawa and

    Freeman(UC Berkeley)

    Stimulus

    RF profile

    Temporal RF

    Reverse Correlation Technique

  • FEV

  • FEV

    Three main types ofreceptive fieldsensitivity profiles

  • FEV

    50

    50

    Time sequence simple cell RF,

    separableDeAngelis, Ohzawa and Freeman, TINS

    1995

    Reverse Correlation Technique

  • Voltage sensitive dye opticalimaging of tree shrew cortex

    Brain-inspired image analysis: multi-orientation(大脑启发算法 – 多方向分析)Neuro-mathematics: multi-orientation analysis by the cortex

    pinwheel

    Cortical hypercolumn0.3 x 0.3 mm

    Color orientation coding

  • FEV

    The technique of Voltage Sensitive Dyes for the measurement of neural population signals was pioneered by prof. AmiramGrinvald, Weizmann Institute, Israel.

    Voltage Sensitive Dyes

    https://en.wikipedia.org/wiki/Voltage-sensitive_dyehttp://www.weizmann.ac.il/brain/grinvald/

  • FEV

    Optical dye response at different orientations(monkey V1): the discovery of cortical hypercolumns (1991).

    From Bonhoeffer and Grinvald, Nature 353, 429-431, 1991

    Voltage Sensitive Dyes

  • FEV

    Fitzpatrick, Duke University, Nature 2002

    Connections exist between similar orientationsto far away columns

    Alexander & van Leeuwen, 2010

  • FEV

    image rotatingkernelorientation space

    What are proper kernels that allow an inverse orientation transform?(no data loss)

    Fourier Transform /Inverse Fourier Transform:

    Sin / CosOrientation Space 2D:

    Cake kernels, a new wavelet family

    Orientation Space 3D:Mathieu functions

    Exactly invertibleorientation transform:

    0;)(),(

    ,)(),(

    2

    2

    ≥∂−=Φ

    ∂−=Φ

    neaz

    eazzz

    nnn

    zzn

    nn

    σ

    σ

    σσ

    σσ

    Multi-orientation differential geometry

  • FEV

    Gabor vs Cake Kernel – Fourier Domain

    Gabor Kernel

    Cake Kernel

  • FEV

    Different orientations are disentangled in the orientation space

    imageorientation score

    image orientation score

    rotatingkernel filtered image

  • FEVFranken, Duits, ter Haar Romeny, TU/e, 2010

    Denoising of crossing fibers(collagen, tissue engineered heart valve)

  • Properties of the Gaussian kernel

    • Cascade property, Gaussian convolved with Gaussian is Gaussian• Normalization, area = 1• Separable• Relation to binomial coefficients• Relation to generalized functions (Dirac, Heavyside)• Fourier transform of Gaussian is also Gaussian• Low-pass filter• Narrow kernel in spatial domain is wide kernel is Fourier domain• Solution of the diffusion equation

  • Properties of the Gaussian derivative kernels

    • Gaussian derivative – Gaussian times Hermite polynomial• Bandwidth filter

  • Differential structure of images

    Gauge invariants are made with intrinsic coordinates v and w.Every derivative with respect to v and/or w is orthogonal invariant

    Notebook

  • Some examples:

  • Second order structure

  • Affine invariant corner detection

  • Affine invariant corner detector:

  • Third order structure:

    Change of isophote curvature at a T-junction →

    Use: 3D TV from 2D video

  • Deep Learning withConvolutional Neural Networks

    Hierarchical learning as the brain

    Needed:

    • Large sets of training data• Clever network architecture• Error backpropagation• Robust classifier

    The first filters must represent the incoming data as efficientas possible → represented in a compact basis

  • Mapping of spatiotemporal receptive fields V1 of the tree shrew.

    The negative subfield of the kernel is located centrally, indicating a preponderance of symmetric second order kernels in the selected RFs.

    Calcium intrinsic imaging combined with reversecorrelation of responses to a sparse noise stimulus.

    From:K. S. Lee, X. Huang, and D. Fitzpatrick. Topology of ON and OFF inputs in visual cortex enables an invariant columnar architecture. Nature, 533(7601):90{94, 5 2016

  • Principal Component Analysis (PCA) finds the intrinsic orthogonal local coordinate frame in the data as the orthogonal eigenvectors of the covariance matrix.

  • Covariance matrix (from Wikipedia):

  • Math: Learn from image patches → simple filters, edges, linesPrincipal Component Analysis

    If data are restricted:filters are restricted

    Lesson:

    Handcrafted filters

    Filters are in the DATA

    • Multi-scale derivatives• Lie group: Infinitesimal

    generator of translation• Taylor expansion

  • https://www.youtube.com/watch?v=QzkMo45pcUo

    Colin Blakemore’s famous experiment with visual derivation (1974)

    Blakemore, Colin, and Grahame F. Cooper. “Development of the brain depends on the visual environment.” (1970): Nature, 228(5270), 477-478.

  • Blakemore’s cat: First three months after birth –it sees only horizontal stripes

    After three months: it could see a horizontal stick, but NOT a vertical one.Lesson: it had never learned filters for vertical lines, they were not in the data.

  • Bev Doolittle: The forest has eyes

    The challenge

    Computer-AidedDiagnosis

  • Bev Doolittle: The forest has eyes

    We have much moredifficulty in recognizingfaces upside-down.

    Slide Number 1Slide Number 2Slide Number 3Slide Number 4Slide Number 5Slide Number 6Slide Number 7Slide Number 8Slide Number 9Slide Number 10Slide Number 11Slide Number 12Slide Number 13Slide Number 14Slide Number 15Slide Number 16Slide Number 17Slide Number 18Slide Number 19Slide Number 20Physiology �of �Front-End VisionSlide Number 22Slide Number 23Slide Number 24Slide Number 25Slide Number 26Slide Number 27Slide Number 28Slide Number 29Slide Number 30Slide Number 31Slide Number 32Slide Number 33Slide Number 34Slide Number 35Slide Number 36Slide Number 37Slide Number 38Slide Number 39Slide Number 40Slide Number 41Slide Number 42Slide Number 43Slide Number 44Slide Number 45Slide Number 46Slide Number 47Gabor vs Cake Kernel – Fourier DomainSlide Number 49Slide Number 50Slide Number 51Slide Number 52Slide Number 53Slide Number 54Slide Number 55Slide Number 56Slide Number 57Slide Number 58Slide Number 59Slide Number 60Slide Number 61Slide Number 62Slide Number 63Slide Number 64Slide Number 65Slide Number 66Slide Number 67