CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Topic 19: Machine Learning and Visual Sensing*

CS 349: Machine Learning

Oliver Cossairt

*Some slides taken from “Machine Leanring Visual Sensing and Machine Learning” lecture by Vivek Boominathan and Ashok Veeraraghavan,

2019 CVPR Workshop on Data-Driven Computational Imaginghttps://ciml.media.mit.edu/

https://ciml.media.mit.edu/

Computational Visual Sensing

OpticalSystem

ComputationalAlgorithm OutputScene

Tagging computational algorithms with optical systems has expandedvisual sensing ability

Example 1: Imaging Through Scattering Media

Fog Snow Water

Moving Particle Scatterer(e.g. biological tissue)

Camera

Measurements

“Coherent Inverse Scattering via Transmission Matrices: Efficient PhaseRetrieval Algorithms and a Public Dataset,” Chris Metzler*, Manoj Sharma*,Sudarshan Nagesh, Richard Baraniuk, Oliver Cossairt, Ashok Veeraraghavan

Learning to see through Scattering Media

Denoised ReconstructionMeasurement

ReconstructionAlgorithm

Ground Truth

Diffuser acts as multiple scattering

material

MeasurementsInput Images

Training Set

Experimental Validation: Imaging through Scattering Media

Measurement


Ground Truth

Measurement


Ground TruthReconstruction

Reconstruction

MeasurementsInput Images

Training Set

Example 2: On-Chip Holographic Microscopy

CMOS sensor

LED

Biological sample

large Field-of-View, compact, cost-effective2.2x2.2 um pixel resolution

Blepharisma hologram

Training DatasetPeranema Spirostomum Didinium Euplotes Paramecium Blepharisma

Collect hologram dataset with high resolution

0.00 s

3.25 s

3D Reconstruction Algorithm

3D Tracking Result

“Dictionary-based phase retrieval for space-time super resolution single lens-free on-chip holographic video,”Winston (Zihao) Wang et. al, COSI 2017.

Summary: Visual Sensing using Machine Learning

OpticalSystem

Machine Learning

Machine Learning System

( I ) Backend ML ( II ) Joint design with ML

( III ) ML with Optical System

OpticalSystem

Machine Learning

OpticalLayer(s)

Electronic Layer(s)

Visual Sensing using Machine Learning

OpticalSystem

Machine Learning

Part I: Backend MLPart I: Backend ML

OpticalSystem

MachineLearning Output

Fixed Data-driven Improve Quality

Data-driven “learned” models can improve the quality of visual sensing systems.

Thin Optical System

Drastically reducing camera thickness by replacing lens with thin mask

S. Asif et al., IEEE Transactions on Computational Imaging (2016)

FlatCam

Ill posed inverse problemForward Model:

Capture Scene

Y XΦ" Φ#

Ill posed – Φ" and Φ# are poorly conditioned

Regularized reconstruction

Reconstruction Regularization

Data-driven reconstruction

Measurement PreviousDeep learning

…

New

End-to-end approach

MeasurementOutput

-1

Fully trainable deep network

Model inversion Perceptual enhancement

W1 ⇥ Y ⇥WT2

<latexit sha1_base64="JMNWijYNrFW9vf3xTXiWgq8VVEw=">AAACAnicdVDJSgNBEO2JW4zbqCfx0hgET8NMFExuAS8eI2STZBx6Oj1Jk56F7hohDMGLv+LFgyJe/Qpv/o2dDVwfFDzeq6Kqnp8IrsC2P4zc0vLK6lp+vbCxubW9Y+7uNVWcSsoaNBaxbPtEMcEj1gAOgrUTyUjoC9byhxcTv3XLpOJxVIdRwtyQ9CMecEpAS5550PIc3AUeMoWvF6TllW7qnlm0rUrF1sC/iWPZUxTRHDXPfO/2YpqGLAIqiFIdx07AzYgETgUbF7qpYgmhQ9JnHU0jole52fSFMT7WSg8HsdQVAZ6qXycyEio1Cn3dGRIYqJ/eRPzL66QQlN2MR0kKLKKzRUEqMMR4kgfucckoiJEmhEqub8V0QCShoFMr6BAWn+L/SbNkOadW6eqsWC3P48ijQ3SETpCDzlEVXaIaaiCK7tADekLPxr3xaLwYr7PWnDGf2UffYLx9AnSrlis=</latexit>

Learned

Based on forward model

Naïve approach

Results

Raw Captures

Tikhonovregularization

Data-drivenEnd-to-End

Event-driven Video Frame Synthesis

Zihao Wang1, Weixin Jiang1, Kuan He1, Boxin Shi2, Aggelos Katsaggelos1, Oliver Cossairt1

1 Northwestern University 2 Peking University

2nd

Int’l Workshop on Physics Based Vision meets Deep Learning (PBDL) in conjunction with

What’s event camera? Another high-speed camera?

6/2/20 Wang et al. PBDL2019, ICCV Workshop 16

Scenario: moving poster with shapes

Capture: 22 FPSDisplay: 1.1 FPS

Data from DAVIS datasetEach pixel:• Compare brightness variations

• (blue: increase; red: decrease)• Small latency (micro-second level)

• 106 FPS! (at max)• Works independently (asynchronous)

6/2/20 PBDL2019, ICCV Workshop 17

We propose intensity frame + eventsfor high frame-rate video synthesis

Our approach: fusion of intensity frame + events

6/2/20 PBDL2019, ICCV Workshop 18

DMR: Differentiable Model-based Reconstruction

Results (DMR)

6/2/20 19

Blurry images Events during exposure EDI [CVPR’19] Ours

Video recovery

• Motion deblur case• Given a blurry image + events in-exposure, recover intermediate sharp frames.

Wang et al. PBDL2019, ICCV Workshop


OpticalSystem

Machine Learning

Part I: Backend ML Part II: Joint design with ML

OpticalSystem

Machine Learning

Part II: Joint design with ML

OpticalSystem

MachineLearning Output

Data-drivenImprove Quality

Joint design with data-driven techniques can bring out best in both systems

PhaseCam3D - Learning Phase Masks For Passive Single-view Depth Estimation

Example 1:

Yicheng Wu, Vivek Boominathan, Huaijin Chen, Aswin Sankaranarayanan, and Ashok Veeraraghavan. “PhaseCam3D—Learning Phase Masks for Passive Single View Depth Estimation.” IEEE International Conference on Computational Photography (ICCP), 2019

Part II: Joint design with ML

Defocus of general lens

Defocused image

× Identical PSF at both sides of the focal plane.× Impossible to tell the depth based on the blur size.

FocusFar NearPSFs at different depths

$ Sensor

Lens

trentwoodsphoto.com

PhaseCam3D: an end-to-end learning approach

Optical System Digital network

…

PhaseCam3D sensor

SensorPhase mask

Lens

Depth map

Scene

q Differential optical modelq Digital networkq End-to-end learning

Optimal simulation resultsHeight map

PSFs-10 -9 -8 -7 -6 -5 -4

-3 -2 -1 0 1 2 3

4 5 6 7 8 9 10

Sharp image Coded image

Estimated disparityTrue disparity

Fabricate the learned phase mask

Photonic Professional GT, Nanoscribe GmbH

Two-photon lithography 3D printer Fabricated phase mask

2.835 mm

Accuracy evaluation: compare with Kinect

Coded Images

Estimated depth by PhaseCam3D

Estimated depth by Kinect

0.6

0.8

1

1.2

1.4

[m]

Error: IJKL:;MLNO6 = 1.25cm


OpticalSystem

Machine Learning


Part I: Backend ML Part II: Joint design with ML

Part III: ML with Optical System

OpticalSystem

Machine Learning

OpticalLayer(s)

Electronic Layer(s)


OpticalLayer(s)

ElectronicLayers(s)

Vision(Inference/

Classification


Incorporating optical layer(s) into machine learning system can decrease latency and power

ASP Vision: Optically Computing the First Layer of CNNs using Angle Sensitive Pixels

Example 1:

Huaijin G. Chen, Suren Jayasuriya, Jiyue Yang, Judy Stephen, Sriram Sivaramakrishnan, Ashok Veeraraghavan, and Alyosha Molnar. “ASP vision: Optically computing the first layer of convolutional neural networks using angle sensitive pixels.” Computer Vision and Pattern Recognition (CVPR), 2016.


ASP camera as first layer of DNN

“Elephant”

L3 LN OutputL2

···

ASP Vision : Sensor + Deep Learning Co-Design

Reduced CNN

ASP CameraL1

Scene

Optically computed

···

ASP camera has gabor like filters that show up as kernels

in many CNNs, eg. AlexNet

How many FLOPs can we save by skipping the first layer?

VGG-M NiN LeNet# of Conv. Layers 8 9 4

Input Image Size 224 ⨉ 224 ⨉ 3 32 ⨉ 32 ⨉ 3 28 ⨉ 28 ⨉ 1

# of First Layer Filters 96 (Original)

12(Prototype)

192 (Original)

12 (Prototype)

20(Original)

12 (Prototype)

First Layer Conv. Kernel 7 ⨉ 7 ⨉ 96 7 ⨉ 7 ⨉ 12 5 ⨉ 5 ⨉ 192 5 ⨉ 5 ⨉ 12 5 ⨉ 5 ⨉ 20 5 ⨉ 5 ⨉ 12

FLOPS of Fist layer 708.0M 88.5 M 14.75M 921.6K 392 K 235 K

Total FLOPS 6.02G 3.83 G 200.3M 157 M 10.4 M 8.8 M

First Layer FLOPS Saving 11.76% 2.3% 7.36% 0.6% 3.77% 2.67%

How many FLOPs can we save by skipping the first layer?

VGG-M NiN LeNet# of Conv. Layers 8 9 4

Input Image Size 224 ⨉ 224 ⨉ 3 32 ⨉ 32 ⨉ 3 28 ⨉ 28 ⨉ 1

# of First Layer Filters 96 (Original)

12(Prototype)

192 (Original)

12 (Prototype)

20(Original)

12 (Prototype)

First Layer Conv. Kernel 7 ⨉ 7 ⨉ 96 7 ⨉ 7 ⨉ 12 5 ⨉ 5 ⨉ 192 5 ⨉ 5 ⨉ 12 5 ⨉ 5 ⨉ 20 5 ⨉ 5 ⨉ 12

FLOPS of Fist layer 708.0M 88.5 M 14.75M 921.6K 392 K 235 K

Total FLOPS 6.02G 3.83 G 200.3M 157 M 10.4 M 8.8 M

First Layer FLOPS Saving 11.76% 2.3% 7.36% 0.6% 3.77% 2.67%

Privacy-preserving action recognition using coded aperture videos, CVPRW’19

classifier action labelsMotion

featuresclassifier action labels

Conventional action recognition Privacy-preserving action recognition

1. Hoppin

g

2. Sta

ggering

3. Ju

mpin

g u

p

4. Ju

mpin

g jack

5. sq

uat

6. Sta

ndin

g u

p

7. Sitting d

ow

n8. Thro

w

9. Cla

ppin

g

10. Handw

avin

g

Thin Optical System

Drastically reducing camera thickness by replacing lens with thin mask

S. Asif et al., IEEE Transactions on Computational Imaging (2016)

FlatCam

Coded Aperture Camera

THANK YOU AND

CONGRATULATIONS!

Documents

CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine