32
Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine Leanring Visual Sensing and Machine Learning” lecture by Vivek Boominathan and Ashok Veeraraghavan, 2019 CVPR Workshop on Data-Driven Computational Imaging https://ciml.media.mit.edu/

CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Topic 19: Machine Learning and Visual Sensing*

CS 349: Machine Learning

Oliver Cossairt

*Some slides taken from “Machine Leanring Visual Sensing and Machine Learning” lecture by Vivek Boominathan and Ashok Veeraraghavan,

2019 CVPR Workshop on Data-Driven Computational Imaginghttps://ciml.media.mit.edu/

Page 2: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Computational Visual Sensing

OpticalSystem

ComputationalAlgorithm OutputScene

Tagging computational algorithms with optical systems has expandedvisual sensing ability

Page 3: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Example 1: Imaging Through Scattering Media

Fog Snow Water

Moving Particle Scatterer(e.g. biological tissue)

Camera

Measurements

“Coherent Inverse Scattering via Transmission Matrices: Efficient PhaseRetrieval Algorithms and a Public Dataset,” Chris Metzler*, Manoj Sharma*,Sudarshan Nagesh, Richard Baraniuk, Oliver Cossairt, Ashok Veeraraghavan

Page 4: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Learning to see through Scattering Media

Denoised ReconstructionMeasurement

ReconstructionAlgorithm

Ground Truth

Diffuser acts as multiple scattering

material

MeasurementsInput Images

Training Set

Page 5: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Experimental Validation: Imaging through Scattering Media

Measurement

ReconstructionAlgorithm

Ground Truth

Measurement

ReconstructionAlgorithm

Ground TruthReconstruction

Reconstruction

MeasurementsInput Images

Training Set

Page 6: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Example 2: On-Chip Holographic Microscopy

CMOS sensor

LED

Biological sample

large Field-of-View, compact, cost-effective2.2x2.2 um pixel resolution

Blepharisma hologram

Training DatasetPeranema Spirostomum Didinium Euplotes Paramecium Blepharisma

Collect hologram dataset with high resolution

0.00 s

3.25 s

3D Reconstruction Algorithm

3D Tracking Result

“Dictionary-based phase retrieval for space-time super resolution single lens-free on-chip holographic video,”Winston (Zihao) Wang et. al, COSI 2017.

Page 7: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Summary: Visual Sensing using Machine Learning

OpticalSystem

Machine Learning

Machine Learning System

( I ) Backend ML ( II ) Joint design with ML

( III ) ML with Optical System

OpticalSystem

Machine Learning

OpticalLayer(s)

Electronic Layer(s)

Page 8: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Visual Sensing using Machine Learning

OpticalSystem

Machine Learning

Part I: Backend MLPart I: Backend ML

OpticalSystem

MachineLearning Output

Fixed Data-driven Improve Quality

Data-driven “learned” models can improve the quality of visual sensing systems.

Page 9: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Thin Optical System

Drastically reducing camera thickness by replacing lens with thin mask

S. Asif et al., IEEE Transactions on Computational Imaging (2016)

FlatCam

Page 10: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Ill posed inverse problemForward Model:

Capture Scene

Y XΦ" Φ#

Ill posed – Φ" and Φ# are poorly conditioned

Page 11: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Regularized reconstruction

Reconstruction Regularization

Page 12: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Data-driven reconstruction

Measurement PreviousDeep learning

New

Page 13: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

End-to-end approach

MeasurementOutput

-1

Fully trainable deep network

Model inversion Perceptual enhancement

W1 ⇥ Y ⇥WT2

<latexit sha1_base64="JMNWijYNrFW9vf3xTXiWgq8VVEw=">AAACAnicdVDJSgNBEO2JW4zbqCfx0hgET8NMFExuAS8eI2STZBx6Oj1Jk56F7hohDMGLv+LFgyJe/Qpv/o2dDVwfFDzeq6Kqnp8IrsC2P4zc0vLK6lp+vbCxubW9Y+7uNVWcSsoaNBaxbPtEMcEj1gAOgrUTyUjoC9byhxcTv3XLpOJxVIdRwtyQ9CMecEpAS5550PIc3AUeMoWvF6TllW7qnlm0rUrF1sC/iWPZUxTRHDXPfO/2YpqGLAIqiFIdx07AzYgETgUbF7qpYgmhQ9JnHU0jole52fSFMT7WSg8HsdQVAZ6qXycyEio1Cn3dGRIYqJ/eRPzL66QQlN2MR0kKLKKzRUEqMMR4kgfucckoiJEmhEqub8V0QCShoFMr6BAWn+L/SbNkOadW6eqsWC3P48ijQ3SETpCDzlEVXaIaaiCK7tADekLPxr3xaLwYr7PWnDGf2UffYLx9AnSrlis=</latexit>

Learned

Based on forward model

Naïve approach

Page 14: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Results

Raw Captures

Tikhonovregularization

Data-drivenEnd-to-End

Page 15: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Event-driven Video Frame Synthesis

Zihao Wang1, Weixin Jiang1, Kuan He1, Boxin Shi2, Aggelos Katsaggelos1, Oliver Cossairt1

1 Northwestern University 2 Peking University

2nd

Int’l Workshop on Physics Based Vision meets Deep Learning (PBDL) in conjunction with

Page 16: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

What’s event camera? Another high-speed camera?

6/2/20 Wang et al. PBDL2019, ICCV Workshop 16

Scenario: moving poster with shapes

Capture: 22 FPSDisplay: 1.1 FPS

Data from DAVIS datasetEach pixel:• Compare brightness variations

• (blue: increase; red: decrease)• Small latency (micro-second level)

• 106 FPS! (at max)• Works independently (asynchronous)

Page 17: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

6/2/20 PBDL2019, ICCV Workshop 17

We propose intensity frame + eventsfor high frame-rate video synthesis

Page 18: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Our approach: fusion of intensity frame + events

6/2/20 PBDL2019, ICCV Workshop 18

DMR: Differentiable Model-based Reconstruction

Page 19: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Results (DMR)

6/2/20 19

Blurry images Events during exposure EDI [CVPR’19] Ours

Video recovery

• Motion deblur case• Given a blurry image + events in-exposure, recover intermediate sharp frames.

Wang et al. PBDL2019, ICCV Workshop

Page 20: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Visual Sensing using Machine Learning

OpticalSystem

Machine Learning

Part I: Backend ML Part II: Joint design with ML

OpticalSystem

Machine Learning

Part II: Joint design with ML

OpticalSystem

MachineLearning Output

Data-drivenImprove Quality

Joint design with data-driven techniques can bring out best in both systems

Page 21: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

PhaseCam3D - Learning Phase Masks For Passive Single-view Depth Estimation

Example 1:

Yicheng Wu, Vivek Boominathan, Huaijin Chen, Aswin Sankaranarayanan, and Ashok Veeraraghavan. “PhaseCam3D—Learning Phase Masks for Passive Single View Depth Estimation.” IEEE International Conference on Computational Photography (ICCP), 2019

Part II: Joint design with ML

Page 22: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Defocus of general lens

Defocused image

× Identical PSF at both sides of the focal plane.× Impossible to tell the depth based on the blur size.

FocusFar NearPSFs at different depths

$ Sensor

Lens

trentwoodsphoto.com

Page 23: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

PhaseCam3D: an end-to-end learning approach

Optical System Digital network

PhaseCam3D sensor

SensorPhase mask

Lens

Depth map

Scene

q Differential optical modelq Digital networkq End-to-end learning

Page 24: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Optimal simulation resultsHeight map

PSFs-10 -9 -8 -7 -6 -5 -4

-3 -2 -1 0 1 2 3

4 5 6 7 8 9 10

Sharp image Coded image

Estimated disparityTrue disparity

Page 25: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Fabricate the learned phase mask

Photonic Professional GT, Nanoscribe GmbH

Two-photon lithography 3D printer Fabricated phase mask

2.835 mm

Page 26: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Accuracy evaluation: compare with Kinect

Coded Images

Estimated depth by PhaseCam3D

Estimated depth by Kinect

0.6

0.8

1

1.2

1.4

[m]

Error: IJKL:;MLNO6 = 1.25cm

Page 27: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Visual Sensing using Machine Learning

OpticalSystem

Machine Learning

Machine Learning System

Part I: Backend ML Part II: Joint design with ML

Part III: ML with Optical System

OpticalSystem

Machine Learning

OpticalLayer(s)

Electronic Layer(s)

Part III: ML with Optical System

OpticalLayer(s)

ElectronicLayers(s)

Vision(Inference/

Classification

Machine Learning System

Incorporating optical layer(s) into machine learning system can decrease latency and power

Page 28: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

ASP Vision: Optically Computing the First Layer of CNNs using Angle Sensitive Pixels

Example 1:

Huaijin G. Chen, Suren Jayasuriya, Jiyue Yang, Judy Stephen, Sriram Sivaramakrishnan, Ashok Veeraraghavan, and Alyosha Molnar. “ASP vision: Optically computing the first layer of convolutional neural networks using angle sensitive pixels.” Computer Vision and Pattern Recognition (CVPR), 2016.

Part III: ML with Optical System

Page 29: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

ASP camera as first layer of DNN

“Elephant”

L3 LN OutputL2

···

ASP Vision : Sensor + Deep Learning Co-Design

Reduced CNN

ASP CameraL1

Scene

Optically computed

···

ASP camera has gabor like filters that show up as kernels

in many CNNs, eg. AlexNet

Page 30: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

How many FLOPs can we save by skipping the first layer?

VGG-M NiN LeNet# of Conv. Layers 8 9 4

Input Image Size 224 ⨉ 224 ⨉ 3 32 ⨉ 32 ⨉ 3 28 ⨉ 28 ⨉ 1

# of First Layer Filters 96 (Original)

12(Prototype)

192 (Original)

12 (Prototype)

20(Original)

12 (Prototype)

First Layer Conv. Kernel 7 ⨉ 7 ⨉ 96 7 ⨉ 7 ⨉ 12 5 ⨉ 5 ⨉ 192 5 ⨉ 5 ⨉ 12 5 ⨉ 5 ⨉ 20 5 ⨉ 5 ⨉ 12

FLOPS of Fist layer 708.0M 88.5 M 14.75M 921.6K 392 K 235 K

Total FLOPS 6.02G 3.83 G 200.3M 157 M 10.4 M 8.8 M

First Layer FLOPS Saving 11.76% 2.3% 7.36% 0.6% 3.77% 2.67%

How many FLOPs can we save by skipping the first layer?

VGG-M NiN LeNet# of Conv. Layers 8 9 4

Input Image Size 224 ⨉ 224 ⨉ 3 32 ⨉ 32 ⨉ 3 28 ⨉ 28 ⨉ 1

# of First Layer Filters 96 (Original)

12(Prototype)

192 (Original)

12 (Prototype)

20(Original)

12 (Prototype)

First Layer Conv. Kernel 7 ⨉ 7 ⨉ 96 7 ⨉ 7 ⨉ 12 5 ⨉ 5 ⨉ 192 5 ⨉ 5 ⨉ 12 5 ⨉ 5 ⨉ 20 5 ⨉ 5 ⨉ 12

FLOPS of Fist layer 708.0M 88.5 M 14.75M 921.6K 392 K 235 K

Total FLOPS 6.02G 3.83 G 200.3M 157 M 10.4 M 8.8 M

First Layer FLOPS Saving 11.76% 2.3% 7.36% 0.6% 3.77% 2.67%

Page 31: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

Privacy-preserving action recognition using coded aperture videos, CVPRW’19

classifier action labelsMotion

featuresclassifier action labels

Conventional action recognition Privacy-preserving action recognition

1. Hoppin

g

2. Sta

ggering

3. Ju

mpin

g u

p

4. Ju

mpin

g jack

5. sq

uat

6. Sta

ndin

g u

p

7. Sitting d

ow

n8. Thro

w

9. Cla

ppin

g

10. Handw

avin

g

Thin Optical System

Drastically reducing camera thickness by replacing lens with thin mask

S. Asif et al., IEEE Transactions on Computational Imaging (2016)

FlatCam

Coded Aperture Camera

Page 32: CS 349: Machine Learning · Topic 19: Machine Learning and Visual Sensing* CS 349: Machine Learning Oliver Cossairt *Some slides taken from “Machine LeanringVisual Sensing and Machine

THANK YOU AND

CONGRATULATIONS!