Upload
hoangcong
View
217
Download
0
Embed Size (px)
Citation preview
Zhao Chen
Machine Learning Intern, NVIDIA
JOINT DETECTION AND SEGMENTATION WITH DEEP HIERARCHICAL NETWORKS
2
ABOUT ME
•5th year PhD student in physics @ Stanford by day, deep learning computer vision scientist by night.
•Intern with Deep Learning Applied Research (Autonomous Vehicles) @ NVIDIA, Oct-Dec 2016.
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
3
TALK OVERVIEW
(1) Problem statement and summary.
(2) Dataset and preliminaries.
(3) Model motivation.
(4) Results and visualizations.
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
4
TALK OVERVIEW
(1) Problem statement and summary.
(2) Dataset and preliminaries.
(3) Model motivation.
(4) Results and visualizations.
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
5
FROM SINGLE TO MULTITASK LEARNING Putting deep learning to work in the real world
Detection Model . . .
Segmentation Model
. . .
Object Bounding Boxes
Segmentation Mask
6
FROM SINGLE TO MULTITASK LEARNING Putting deep learning to work in the real world
Detection Model . . .
Segmentation Model
. . .
Object Bounding Boxes
Segmentation Mask
Poor scalability + inefficient use of information!
7
FROM SINGLE TO MULTITASK LEARNING
How do we use one model to perform multiple tasks faster and better?
Putting deep learning to work in the real world
Shared Model
. . . Object Bounding Boxes
Segmentation Mask
8
FROM SINGLE TO MULTITASK LEARNING
How do we use one model to perform multiple tasks faster and better?
Putting deep learning to work in the real world
Shared Model
. . . Object Bounding Boxes
Segmentation Mask
+ edge detection, + surface normals, + distance estimation…
9
FROM SINGLE TO MULTITASK LEARNING
How do we use one model to perform multiple tasks faster and better?
Putting deep learning to work in the real world
Shared Model
. . . Object Bounding Boxes
Segmentation Mask
How do you relate various tasks to each other in a multi-task neural network?
10
WHAT WE WILL SHOW
•By ordering tasks based on receptive field and information density, we improve segmentation and detection accuracy by ~2% and ~8% over single networks, respectively.
•The joint network is robust and easy to tune compared to non-hierarchical baselines.
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
11
TALK OVERVIEW
(1) Problem statement and summary.
(2) Dataset and preliminaries.
(3) Model motivation.
(4) Results and visualizations.
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
12
CITYSCAPES DATASET • 2975 Training Images @ resolution 1024 x 2048.
• 20 classes for semantic segmentation, including 8 object classes. Of these 8, 4 are much more represented (car, bicycle, person, rider): the “easy classes.”
• Both segmentation, bounding box, and edge ground truth can be generated.
Raw Image
Edge Detection
Semantic Seg.
Bounding Box
13
HOW TO TRAIN A SEGMENTATION NETWORK • Standard FCN (Shelhamer 2015) Architecture: Convolutions followed by a
deconvolution to retrieve a pixel-dense prediction mask.
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
14
HOW TO TRAIN A DETECTION NETWORK • Network outputs confidence that a pixel lies near the center of an object.
• Points of high confidence produce bounding box coordinates.
• Confidences are rougher than full segmentation but robust to occlusion.
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
15
TALK OVERVIEW
(1) Problem statement and summary.
(2) Dataset and preliminaries.
(3) Model motivation.
(4) Results and visualizations.
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
16
Shared Feature Map (from base CNN) Input (1024 x 2048)
Deconv
Low-Res Seg Predictions (W x H x 20)
Obj. Confidence Positions
Bbox Coordinate Positions
L = αLseg + (1- α)Ldet Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
17
OUR BASELINE MODEL PERFORMANCE
Seg. Weight Det. Weight
(α controls how much attention we pay to segmentation vs detection at training)
= α
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
18
OUR BASELINE MODEL PERFORMANCE
Seg. Weight Det. Weight
(α controls how much attention we pay to segmentation vs detection at training)
= α
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
19
OUR BASELINE MODEL PERFORMANCE
Seg. Weight Det. Weight
(α controls how much attention we pay to segmentation vs detection at training)
= α
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
20
OUR BASELINE MODEL PERFORMANCE
Seg. Weight Det. Weight
(α controls how much attention we pay to segmentation vs detection at training)
= α
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
21
OUR BASELINE MODEL PERFORMANCE
Seg. Weight Det. Weight
(α controls how much attention we pay to segmentation vs detection at training)
= α
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
22
OUR BASELINE MODEL PERFORMANCE
Seg. Weight Det. Weight
(α controls how much attention we pay to segmentation vs detection at training)
= α
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
23
OUR BASELINE MODEL PERFORMANCE
Seg. Weight Det. Weight
(α controls how much attention we pay to segmentation vs detection at training)
= α
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
24
OUR BASELINE MODEL PERFORMANCE
Seg. Weight Det. Weight
(α controls how much attention we pay to segmentation vs detection at training)
= α
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
25
A LABEL HIERARCHY ALONG TWO AXES
Density of Information
Requ
ired
Rec
epti
ve F
ield
Object Bounding Boxes
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
26
A LABEL HIERARCHY ALONG TWO AXES
Density of Information
Requ
ired
Rec
epti
ve F
ield
Object Bounding Boxes
Object Confidence
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
27
A LABEL HIERARCHY ALONG TWO AXES
Density of Information
Requ
ired
Rec
epti
ve F
ield
Object Bounding Boxes
Semantic Segmentation
Object Confidence
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
28
A LABEL HIERARCHY ALONG TWO AXES
Density of Information
Requ
ired
Rec
epti
ve F
ield
Object Bounding Boxes Edge Detection
Semantic Segmentation
Object Confidence
(plus)
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
29
Shared Feature Map (from base CNN) Input (1024 x 2048)
Deconv
Low-Res Seg Predictions (W x H x 20)
Obj. Confidence Positions
Bbox Coordinate Positions
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
30
Shared Feature Map (from base CNN) Input (1024 x 2048)
Segmentation Features
Deconv
Low-Res Seg Predictions (W x H x 20)
Obj. Confidence Features
Obj. Confidence Positions
Obj. BBox Features
Bbox Coordinate Positions
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
31
Shared Feature Map (from base CNN) Input (1024 x 2048)
Segmentation Features
Deconv
Low-Res Seg Predictions (W x H x 20)
Obj. Confidence Features
Obj. Confidence Positions
Obj. BBox Features
Bbox Coordinate Positions
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
Decreasing information density
32
Shared Feature Map (from base CNN)
Edge Features
Deconv
Input (1024 x 2048)
Low-Res Edge Predictions (W x H x 3)
Segmentation Features
Deconv
Low-Res Seg Predictions (W x H x 20)
Obj. Confidence Features
Obj. Confidence Positions
Obj. BBox Features
Bbox Coordinate Positions
Decreasing information density Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
33
Shared Feature Map (from base CNN)
Edge Features
Deconv
Input (1024 x 2048)
Low-Res Edge Predictions (W x H x 3)
Segmentation Features
Deconv
Low-Res Seg Predictions (W x H x 20)
Obj. Confidence Features
Obj. Confidence Positions
Obj. BBox Features
Bbox Coordinate Positions
Decreasing information density Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
34
Shared Feature Map (from base CNN)
Edge Features
Deconv
Input (1024 x 2048)
Low-Res Edge Predictions (W x H x 3)
Segmentation Features
Deconv
Low-Res Seg Predictions (W x H x 20)
Obj. Confidence Features
Obj. Confidence Positions
Obj. BBox Features
Bbox Coordinate Positions
X
Decreasing information density Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
35
Shared Feature Map (from base CNN)
Edge Features
Deconv
Input (1024 x 2048)
Low-Res Edge Predictions (W x H x 3)
Segmentation Features
Deconv
Low-Res Seg Predictions (W x H x 20)
Obj. Confidence Features
Obj. Confidence Positions
Obj. BBox Features
Bbox Coordinate Positions
X
Increasing receptive field Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
36
Shared Feature Map (from base CNN)
Edge Features
Deconv
Input (1024 x 2048)
Low-Res Edge Predictions (W x H x 3)
Segmentation Features
Deconv
Low-Res Seg Predictions (W x H x 20)
Obj. Confidence Features
Obj. Confidence Positions
Obj. BBox Features
Dilated Bbox Coordinate Positions
Dilated Convs
Increasing receptive field Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
37
Shared Feature Map (from base CNN)
Edge Features
Deconv
Input (1024 x 2048)
Low-Res Edge Predictions (W x H x 3)
Segmentation Features
Deconv
Low-Res Seg Predictions (W x H x 20)
Obj. Confidence Features
Obj. Confidence Positions
Obj. BBox Features
Dilated Bbox Coordinate Positions
Dilated Convs
Deep Hierarchical Network (DHM)
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
38
TALK OVERVIEW
(1) Problem statement and summary.
(2) Dataset and preliminaries.
(3) Model motivation.
(4) Results and visualizations.
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
39
RESULTS: HIGH ROBUSTNESS
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
40
RESULTS: HIGH ROBUSTNESS
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
42
Edge Predictions
RAW IMAGE
Segmentation Predictions
Bounding Box Predictions
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
51
SUMMARY • Our two hierarchies within our model allow our network to reason about intra-
task relationships:
• Information density: (Seg +) Edge > Seg > Object Conf > Bbox
• Receptive field: (Seg +) Edge = Bbox >> Object Conf > Seg
• With these relationships wired in, our network is:
• More accurate
• Robust to tuning
• Simultaneously better at fine detail and more instance aware
• Efficient and scalable (3 tasks, 1 network!)
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.
52
REFERENCES •J. Yao, S. Fidler, and R. Urtasun. Describing the scene as a whole: Joint object detection, scene classificationa and semantic segmentation. In CVPR, 2012.
•S. Gidaris and N. Komodakis. Object detection via a multiregion and semantic segmentation-aware cnn model. In ICCV, 2015.
•B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. In ECCV, 2014.
•S. Liu, X. Qi, J. Shi, H. Zhang, and J. Jia. Multi-scale patch aggregation (mpa) for simultaneous detection and segmentation. In CVPR, 2016.
•E. Shelhamer, J. Long, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
•B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Hypercolumns for object segmentation and fine-grained localization. In CVPR, 2015.
•J. Dai, K. He, and J. Sun. Instance-aware semantic segmentation via multi-task network cascades. In https://arxiv.org/pdf/1512.04412.pdf, 2015.
53
THANK YOU!
Special thanks to:
My internship mentor: Jian Yao
My managers: John Zedlewski and Andrew Tao
All the wonderful people in DLAR/DLAV.
Additional questions/comments: [email protected]
Zhao Chen, Joint Detection and Segmentation with Deep Hierarchical Networks, GTC 2017.