43
Gang Yu 旷视研究院 Context For Semantic Segmentation

Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Gang Yu

旷视研究院

Context For Semantic Segmentation

Page 2: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Chao Peng Jingbo WangChangqian Yu Changxin GaoXiangyu Zhang Gang Yu Jian Sun

Collaborators

Nong Sang

Page 3: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Outline

• Revisit Semantic Segmentation• Context for Semantic Segmentation

• Backbone• Head• Loss

• Conclusion

Page 4: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Outline

• Revisit Semantic Segmentation• Context for Semantic Segmentation

• Backbone• Head• Loss

• Conclusion

Page 5: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

What is Semantic Segmentation?

• Classification + Localization• Visual Recognition

• Classification• Semantic Segmentation• Instance Segmentation• Panoptic Segmentation• Detection• Keypoint Detection

Page 6: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Pipeline

Backbone Head

LOSS

VGG16

ResNet

ResNext

Softmax

L2

U-Shape

4/8-Sampling + Dilation

Page 7: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Challenges in Semantic Segmentation?

• Speed• Performance

• Per-pixel Accuracy• Boundary

Page 8: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

What is Context?

• According to Dictionary:• the parts of a discourse that surround a word or passage and

can throw light on its meaning

Sports

ball

Grass

Play

Fields

Person

Page 9: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Outline

• Revisit Semantic Segmentation• Context for Semantic Segmentation

• Backbone• Head• Loss

• Conclusion

Page 10: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Backbone

• Motivation• Traditional Backbone is designed for Classification

• Large Receptive field by compromising spatial resolution• Segmentation requires both Classification & Localization

• Maintain both Receptive Field (context) & Spatial resolution• Computational cost?

Page 11: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Backbone - BiSeNet

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018

• BiSeNet: Bilateral Segmentation Network

Page 12: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Backbone - BiSeNet

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018

• Pipeline

Page 13: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Backbone - BiSeNet

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018

• Results

Page 14: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Backbone - BiSeNet

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018

• Ablation Results

Page 15: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Backbone - BiSeNet

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018

• Speed

Page 16: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Backbone - BiSeNet

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018

• Summary• Two path in backbone: Spatial path + Context path • Context is implicitly encoded in receptive field• Efficient speed• Code: https://github.com/ycszen/TorchSeg

• Context:• A branch encodes semantic meaning with large receptive field?

• Related work:• ICNet for Real-Time Semantic Segmentation on High-Resolution Images, Hengshuang Zhao,

Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia, ECCV2018• Stacked Hourglass Networks for Human Pose Estimation, Alejandro Newell, Kaiyu Yang, Jia

Deng, ECCV2016

Page 17: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head

• Motivation• Large Receptive field without compromising boundary results• Why working on Head?

• Efficient speed• Obvious gain on increasing the receptive• Simple to implement

Page 18: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – Large Kernel

• Receptive Field vs Valid Receptive Field

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017

Page 19: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – Large Kernel

• Large Kernel Matters

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017

Page 20: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – Large Kernel

• Large Kernel Matters• Why Boundary Refinement?

• Large receptive field will blur the object boundary

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017

Page 21: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – Large Kernel

• Large Kernel Matters• Ablation: Why Boundary Refinement?

• Large receptive field will blur the object boundary

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017

Page 22: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – Large Kernel

• Large Kernel Matters• Ablation: Different kernel size?

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017

Page 23: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – Large Kernel

• Large Kernel Matters• Ablation: Are more parameters helpful?

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017

Page 24: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – Large Kernel

• Large Kernel Matters• Ablation: GCN vs. Stack of small convolutions

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017

Page 25: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – Large Kernel

• Large Kernel Matters• Ablation: GCN in Backbone

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017

Page 26: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – Large Kernel

• Large Kernel Matters: illustrative examples

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017

Page 27: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – Large Kernel

• Summary• Global Convolution network to increase the receptive field• Large separable convolution is an efficient implementation

• Context• Large receptive field?

• Related work• PSPNet: Pyramid Scene Parsing Network, Hengshuang Zhao, Jianping Shi, Xiaojuan Qi,

Xiaogang Wang, Jiaya Jia, CVPR2017• DeeplabV3: Rethinking Atrous Convolution for Semantic Image Segmentation, Liang-Chieh

Chen, George Papandreou, Florian Schroff, Hartwig Adam

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017

Page 28: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – DFN

• Motivation:• Large kernel (GCN) is computationally intensive

• Global pooling is efficient to compute and can obtain the global context

• Large receptive field does not equal to good context• Attention strategy to adaptively aggreate the features

Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018

Page 29: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – DFN

• DFN: Pipeline

Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018

Page 30: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – DFN

• DFN: Ablation

Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018

Page 31: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – DFN

• DFN: Results

Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018

Page 32: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Head – DFN

• Summary• Global pooling is efficient and effective to capture the long-range

context• Attention for adaptive adjusting feature weights• Code: https://github.com/ycszen/TorchSeg/

• Context• Receptive field & feature aggregation?

• Related work• Non-local Neural Networks, Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He, CVPR2018• CCNet: Criss-Cross Attention for Semantic Segmentation, Zilong Huang, Xinggang Wang, Lichao

Huang, Chang Huang, Yunchao Wei, Wenyu Liu• PSANet: Point-wise Spatial Attention Network for Scene Parsing, Hengshuang Zhao*, Yi Zhang*, Shu

Liu, Jianping Shi, Chen Change Loy, Dahua Lin, Jiaya Jia, ECCV2018• OCNet: Object Context Network for Scene Parsing, Yuhui Yuan, Jingdong Wang• ParseNet: Looking Wider to See Better, Wei Liu, Andrew Rabinovich, Alexander C. Berg

Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018

Page 33: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Loss

• Motivation• “Thing” may be important for stuff prediction

COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf

Sports

ball

Grass

Play

Fields

Person

Page 34: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Loss

• Motivation• “Thing” may be important for stuff prediction

COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf

Page 35: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Encoder Train/Inference Train Supervision Inference MergeRes-Block

Multi Types Context

Objects

Semantic

Stuff

Stuff

Context in Loss

• Pipeline

COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf

Page 36: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Loss

• COCO2018 Panoptic Segmentation Challenge

49.3 49.6 54.1 54.550.8

Res50

+Encoder

+Extra Res

Blocks

+Multi

Context

+Huge

Backbone

+Multi-Scale

Flip Test

Results of Stuff Regions on

COCO2018 Panoptic

Segmentation Validation

Dataset

Metric:Mean IoU%

Finally, we assembled three

models and achieve 55.9%

mIoU on this dataset.

COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf

Page 37: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Loss

• COCO2018 Panoptic Segmentation Challenge

COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf

Page 38: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Context in Loss

• Summary• “Thing” and “stuff” are complementary• Loss is a good approach to encode the context

• Better feature representation• Context

• A loss to encode the semantic meaning?• Related work

• Context Encoding for Semantic Segmentation, Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal, CVPR2018

COCO2018 Panoptic Segmentation Challenge, http://presentations.cocodataset.org/ECCV18/COCO18-Panoptic-Megvii.pdf

Page 39: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Outline

• Revisit Semantic Segmentation• Context for Semantic Segmentation

• Backbone• Head• Loss

• Conclusion

Page 40: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Conclusion

• Context in different parts• Backbone, Head, Loss

• What is Context?• Large receptive field? • A semantic branch?• Spatial/feature aggregation?

• Future work• Explicitly show what is a context• Panoptic seg: Stuff vs Thing

Page 41: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Reference

• Pyramid Scene Parsing Network, Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia, CVPR2017

• ICNet for Real-Time Semantic Segmentation on High-Resolution Images, Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia, ECCV2018

• Context Encoding for Semantic Segmentation, Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal, CVPR2018

• Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam, ECCV2018

• Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network, Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun, CVPR, 2017

• Learning a Discriminative Feature Network for Semantic Segmentation, Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, CVPR, 2018

• BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation, Changqian Yu, JingboWang, Chao Peng, Changxin Gao, Gang Yu, Nong Sang, ECCV, 2018

Page 42: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global

Q&A

• Megvii Detection 知乎专栏

• Webpage: http://www.skicyyu.org/

• Email: [email protected]

Page 43: Context For Semantic Segmentationvalser.org/webinar/slide/slides/20190529/2019.05.29 俞刚.pdfMay 29, 2019  · •Global pooling is efficient to compute and can obtain the global