Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Topic 19: Machine Learning and Visual Sensing*
CS 349: Machine Learning
Oliver Cossairt
*Some slides taken from “Machine Leanring Visual Sensing and Machine Learning” lecture by Vivek Boominathan and Ashok Veeraraghavan,
2019 CVPR Workshop on Data-Driven Computational Imaginghttps://ciml.media.mit.edu/
Computational Visual Sensing
OpticalSystem
ComputationalAlgorithm OutputScene
Tagging computational algorithms with optical systems has expandedvisual sensing ability
Example 1: Imaging Through Scattering Media
Fog Snow Water
Moving Particle Scatterer(e.g. biological tissue)
Camera
Measurements
“Coherent Inverse Scattering via Transmission Matrices: Efficient PhaseRetrieval Algorithms and a Public Dataset,” Chris Metzler*, Manoj Sharma*,Sudarshan Nagesh, Richard Baraniuk, Oliver Cossairt, Ashok Veeraraghavan
Learning to see through Scattering Media
Denoised ReconstructionMeasurement
ReconstructionAlgorithm
Ground Truth
Diffuser acts as multiple scattering
material
MeasurementsInput Images
Training Set
Experimental Validation: Imaging through Scattering Media
Measurement
ReconstructionAlgorithm
Ground Truth
Measurement
ReconstructionAlgorithm
Ground TruthReconstruction
Reconstruction
MeasurementsInput Images
Training Set
Example 2: On-Chip Holographic Microscopy
CMOS sensor
LED
Biological sample
large Field-of-View, compact, cost-effective2.2x2.2 um pixel resolution
Blepharisma hologram
Training DatasetPeranema Spirostomum Didinium Euplotes Paramecium Blepharisma
Collect hologram dataset with high resolution
0.00 s
3.25 s
3D Reconstruction Algorithm
3D Tracking Result
“Dictionary-based phase retrieval for space-time super resolution single lens-free on-chip holographic video,”Winston (Zihao) Wang et. al, COSI 2017.
Summary: Visual Sensing using Machine Learning
OpticalSystem
Machine Learning
Machine Learning System
( I ) Backend ML ( II ) Joint design with ML
( III ) ML with Optical System
OpticalSystem
Machine Learning
OpticalLayer(s)
Electronic Layer(s)
Visual Sensing using Machine Learning
OpticalSystem
Machine Learning
Part I: Backend MLPart I: Backend ML
OpticalSystem
MachineLearning Output
Fixed Data-driven Improve Quality
Data-driven “learned” models can improve the quality of visual sensing systems.
Thin Optical System
Drastically reducing camera thickness by replacing lens with thin mask
S. Asif et al., IEEE Transactions on Computational Imaging (2016)
FlatCam
Ill posed inverse problemForward Model:
Capture Scene
Y XΦ" Φ#
Ill posed – Φ" and Φ# are poorly conditioned
Regularized reconstruction
Reconstruction Regularization
Data-driven reconstruction
Measurement PreviousDeep learning
…
New
End-to-end approach
MeasurementOutput
-1
Fully trainable deep network
Model inversion Perceptual enhancement
W1 ⇥ Y ⇥WT2
<latexit sha1_base64="JMNWijYNrFW9vf3xTXiWgq8VVEw=">AAACAnicdVDJSgNBEO2JW4zbqCfx0hgET8NMFExuAS8eI2STZBx6Oj1Jk56F7hohDMGLv+LFgyJe/Qpv/o2dDVwfFDzeq6Kqnp8IrsC2P4zc0vLK6lp+vbCxubW9Y+7uNVWcSsoaNBaxbPtEMcEj1gAOgrUTyUjoC9byhxcTv3XLpOJxVIdRwtyQ9CMecEpAS5550PIc3AUeMoWvF6TllW7qnlm0rUrF1sC/iWPZUxTRHDXPfO/2YpqGLAIqiFIdx07AzYgETgUbF7qpYgmhQ9JnHU0jole52fSFMT7WSg8HsdQVAZ6qXycyEio1Cn3dGRIYqJ/eRPzL66QQlN2MR0kKLKKzRUEqMMR4kgfucckoiJEmhEqub8V0QCShoFMr6BAWn+L/SbNkOadW6eqsWC3P48ijQ3SETpCDzlEVXaIaaiCK7tADekLPxr3xaLwYr7PWnDGf2UffYLx9AnSrlis=</latexit>
Learned
Based on forward model
Naïve approach
Results
Raw Captures
Tikhonovregularization
Data-drivenEnd-to-End
Event-driven Video Frame Synthesis
Zihao Wang1, Weixin Jiang1, Kuan He1, Boxin Shi2, Aggelos Katsaggelos1, Oliver Cossairt1
1 Northwestern University 2 Peking University
2nd
Int’l Workshop on Physics Based Vision meets Deep Learning (PBDL) in conjunction with
What’s event camera? Another high-speed camera?
6/2/20 Wang et al. PBDL2019, ICCV Workshop 16
Scenario: moving poster with shapes
Capture: 22 FPSDisplay: 1.1 FPS
Data from DAVIS datasetEach pixel:• Compare brightness variations
• (blue: increase; red: decrease)• Small latency (micro-second level)
• 106 FPS! (at max)• Works independently (asynchronous)
6/2/20 PBDL2019, ICCV Workshop 17
We propose intensity frame + eventsfor high frame-rate video synthesis
Our approach: fusion of intensity frame + events
6/2/20 PBDL2019, ICCV Workshop 18
DMR: Differentiable Model-based Reconstruction
Results (DMR)
6/2/20 19
Blurry images Events during exposure EDI [CVPR’19] Ours
Video recovery
• Motion deblur case• Given a blurry image + events in-exposure, recover intermediate sharp frames.
Wang et al. PBDL2019, ICCV Workshop
Visual Sensing using Machine Learning
OpticalSystem
Machine Learning
Part I: Backend ML Part II: Joint design with ML
OpticalSystem
Machine Learning
Part II: Joint design with ML
OpticalSystem
MachineLearning Output
Data-drivenImprove Quality
Joint design with data-driven techniques can bring out best in both systems
PhaseCam3D - Learning Phase Masks For Passive Single-view Depth Estimation
Example 1:
Yicheng Wu, Vivek Boominathan, Huaijin Chen, Aswin Sankaranarayanan, and Ashok Veeraraghavan. “PhaseCam3D—Learning Phase Masks for Passive Single View Depth Estimation.” IEEE International Conference on Computational Photography (ICCP), 2019
Part II: Joint design with ML
Defocus of general lens
Defocused image
× Identical PSF at both sides of the focal plane.× Impossible to tell the depth based on the blur size.
FocusFar NearPSFs at different depths
$ Sensor
Lens
trentwoodsphoto.com
PhaseCam3D: an end-to-end learning approach
Optical System Digital network
…
PhaseCam3D sensor
SensorPhase mask
Lens
Depth map
Scene
q Differential optical modelq Digital networkq End-to-end learning
Optimal simulation resultsHeight map
PSFs-10 -9 -8 -7 -6 -5 -4
-3 -2 -1 0 1 2 3
4 5 6 7 8 9 10
Sharp image Coded image
Estimated disparityTrue disparity
Fabricate the learned phase mask
Photonic Professional GT, Nanoscribe GmbH
Two-photon lithography 3D printer Fabricated phase mask
2.835 mm
Accuracy evaluation: compare with Kinect
Coded Images
Estimated depth by PhaseCam3D
Estimated depth by Kinect
0.6
0.8
1
1.2
1.4
[m]
Error: IJKL:;MLNO6 = 1.25cm
Visual Sensing using Machine Learning
OpticalSystem
Machine Learning
Machine Learning System
Part I: Backend ML Part II: Joint design with ML
Part III: ML with Optical System
OpticalSystem
Machine Learning
OpticalLayer(s)
Electronic Layer(s)
Part III: ML with Optical System
OpticalLayer(s)
ElectronicLayers(s)
Vision(Inference/
Classification
Machine Learning System
Incorporating optical layer(s) into machine learning system can decrease latency and power
ASP Vision: Optically Computing the First Layer of CNNs using Angle Sensitive Pixels
Example 1:
Huaijin G. Chen, Suren Jayasuriya, Jiyue Yang, Judy Stephen, Sriram Sivaramakrishnan, Ashok Veeraraghavan, and Alyosha Molnar. “ASP vision: Optically computing the first layer of convolutional neural networks using angle sensitive pixels.” Computer Vision and Pattern Recognition (CVPR), 2016.
Part III: ML with Optical System
ASP camera as first layer of DNN
“Elephant”
L3 LN OutputL2
···
ASP Vision : Sensor + Deep Learning Co-Design
Reduced CNN
ASP CameraL1
Scene
Optically computed
···
ASP camera has gabor like filters that show up as kernels
in many CNNs, eg. AlexNet
How many FLOPs can we save by skipping the first layer?
VGG-M NiN LeNet# of Conv. Layers 8 9 4
Input Image Size 224 ⨉ 224 ⨉ 3 32 ⨉ 32 ⨉ 3 28 ⨉ 28 ⨉ 1
# of First Layer Filters 96 (Original)
12(Prototype)
192 (Original)
12 (Prototype)
20(Original)
12 (Prototype)
First Layer Conv. Kernel 7 ⨉ 7 ⨉ 96 7 ⨉ 7 ⨉ 12 5 ⨉ 5 ⨉ 192 5 ⨉ 5 ⨉ 12 5 ⨉ 5 ⨉ 20 5 ⨉ 5 ⨉ 12
FLOPS of Fist layer 708.0M 88.5 M 14.75M 921.6K 392 K 235 K
Total FLOPS 6.02G 3.83 G 200.3M 157 M 10.4 M 8.8 M
First Layer FLOPS Saving 11.76% 2.3% 7.36% 0.6% 3.77% 2.67%
How many FLOPs can we save by skipping the first layer?
VGG-M NiN LeNet# of Conv. Layers 8 9 4
Input Image Size 224 ⨉ 224 ⨉ 3 32 ⨉ 32 ⨉ 3 28 ⨉ 28 ⨉ 1
# of First Layer Filters 96 (Original)
12(Prototype)
192 (Original)
12 (Prototype)
20(Original)
12 (Prototype)
First Layer Conv. Kernel 7 ⨉ 7 ⨉ 96 7 ⨉ 7 ⨉ 12 5 ⨉ 5 ⨉ 192 5 ⨉ 5 ⨉ 12 5 ⨉ 5 ⨉ 20 5 ⨉ 5 ⨉ 12
FLOPS of Fist layer 708.0M 88.5 M 14.75M 921.6K 392 K 235 K
Total FLOPS 6.02G 3.83 G 200.3M 157 M 10.4 M 8.8 M
First Layer FLOPS Saving 11.76% 2.3% 7.36% 0.6% 3.77% 2.67%
Privacy-preserving action recognition using coded aperture videos, CVPRW’19
classifier action labelsMotion
featuresclassifier action labels
Conventional action recognition Privacy-preserving action recognition
1. Hoppin
g
2. Sta
ggering
3. Ju
mpin
g u
p
4. Ju
mpin
g jack
5. sq
uat
6. Sta
ndin
g u
p
7. Sitting d
ow
n8. Thro
w
9. Cla
ppin
g
10. Handw
avin
g
Thin Optical System
Drastically reducing camera thickness by replacing lens with thin mask
S. Asif et al., IEEE Transactions on Computational Imaging (2016)
FlatCam
Coded Aperture Camera
THANK YOU AND
CONGRATULATIONS!