49
Deep Learning for image segmentation Michael Jamroz & Matthew Opala

#6 PyData Warsaw: Deep learning for image segmentation

Embed Size (px)

Citation preview

Page 1: #6 PyData Warsaw: Deep learning for image segmentation

Deep Learning for image segmentation

Michael Jamroz & Matthew Opala

Page 2: #6 PyData Warsaw: Deep learning for image segmentation

AGENDA

Deep Learning methods for image segmentation

Case study - clothing parsing

Segmentation in Computer Vision

Page 3: #6 PyData Warsaw: Deep learning for image segmentation

Segmentation in Computer Vision1

Page 4: #6 PyData Warsaw: Deep learning for image segmentation

Computer Vision tasks

DRESS HEELS

BAG

Classification Detection Segmentation

DRESS HEELS

BAG

DRESS HEELS

BAG

Page 5: #6 PyData Warsaw: Deep learning for image segmentation

Semantic Segmentation

◦ Annotate each pixel◦ Doesn’t differentiate instances◦ Classic computer vision task

Page 6: #6 PyData Warsaw: Deep learning for image segmentation

Instance Aware Segmentation

◦ Detect instances

◦ Annotate each pixel

◦ Simultaneous

detection and

segmentation

◦ Recent challenge in

MS-COCO

Page 7: #6 PyData Warsaw: Deep learning for image segmentation

Traditional methods

Kota Yamaguchi, M Hadi Kiapour, Tamara L Berg, "Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items", ICCV 2013

● Multi-stage pipeline with image features engineered by hand (HoGs, MR8 etc.)

● Segmentation -> classification of every pixel with linear regression

Page 8: #6 PyData Warsaw: Deep learning for image segmentation

Deep Learning methods for image segmentation

2

Page 9: #6 PyData Warsaw: Deep learning for image segmentation

Convolutional neural networks

● Firstly used successfully in classification task● Three basic operations: convolution, pooling,

nonlinearity function

Page 10: #6 PyData Warsaw: Deep learning for image segmentation

Semantic segmentation with CNN

CNN DRESS

Input Extract Patch Classify center pixel

Repeat for each pixel

Page 11: #6 PyData Warsaw: Deep learning for image segmentation

Semantic segmentation with CNN

CNN Smaller output due to pooling

Page 12: #6 PyData Warsaw: Deep learning for image segmentation

Fully Convolutional Neural Networks

Long, Shelhamer and Darrell, “Fully Convolutional Networks For Semantic Segmentation”, CVPR 2015

Page 13: #6 PyData Warsaw: Deep learning for image segmentation

Fully Convolutional Neural Networks

Page 14: #6 PyData Warsaw: Deep learning for image segmentation

Learnable upsampling: deconvolution

Typical 3 x 3 convolution, stride 1 pad 1

Input: 4 x 4 Output: 4 x 4

Page 15: #6 PyData Warsaw: Deep learning for image segmentation

Learnable upsampling: deconvolution

Typical 3 x 3 convolution, stride 1 pad 1

Input: 4 x 4 Output: 4 x 4

Dot productbetween filter and

input

Page 16: #6 PyData Warsaw: Deep learning for image segmentation

Learnable upsampling: deconvolution

Typical 3 x 3 convolution, stride 1 pad 1

Input: 4 x 4 Output: 4 x 4

Dot productbetween filter and

input

Page 17: #6 PyData Warsaw: Deep learning for image segmentation

Learnable upsampling: deconvolution

Typical 3 x 3 convolution, stride 2 pad 1

Input: 4 x 4 Output: 2 x 2

Page 18: #6 PyData Warsaw: Deep learning for image segmentation

Learnable upsampling: deconvolution

Typical 3 x 3 convolution, stride 2 pad 1

Input: 4 x 4 Output: 2 x 2

Dot productbetween filter

and input

Page 19: #6 PyData Warsaw: Deep learning for image segmentation

Learnable upsampling: deconvolution

Typical 3 x 3 convolution, stride 2 pad 1

Input: 4 x 4 Output: 2 x 2

Dot productbetween filter

and input

Page 20: #6 PyData Warsaw: Deep learning for image segmentation

Learnable upsampling: deconvolution

3 x 3 “deconvolution”, stride 2 pad 1

Input: 2 x 2 Output: 4 x 4

Page 21: #6 PyData Warsaw: Deep learning for image segmentation

Learnable upsampling: deconvolution

3 x 3 “deconvolution”, stride 2 pad 1

Input: 2 x 2 Output: 4 x 4

Input gives weight for filter

Page 22: #6 PyData Warsaw: Deep learning for image segmentation

Learnable upsampling: deconvolution

3 x 3 “deconvolution”, stride 2 pad 1

Input: 2 x 2 Output: 4 x 4

Input gives weight for filter

Page 23: #6 PyData Warsaw: Deep learning for image segmentation

Learnable upsampling: deconvolution

3 x 3 “deconvolution”, stride 2 pad 1

Input: 2 x 2 Output: 4 x 4

Input gives weight for filter

Sum where output overlaps

Page 24: #6 PyData Warsaw: Deep learning for image segmentation

Deconvolution Network for Semantic Segmentation

Normal VGG “Upside down” VGG

Noh, Hong and Hang, “Learning Deconvolution Network for Semantic Segmentation”, arXiv 2015

Page 25: #6 PyData Warsaw: Deep learning for image segmentation

Deconvolution Network: Pooling

Input

Pooled map

Switch Variables

Page 26: #6 PyData Warsaw: Deep learning for image segmentation

Deconvolution Network: Unpooling

Input

Pooled map

Switch Variables

Page 27: #6 PyData Warsaw: Deep learning for image segmentation

DeconvNet vs. FCN

Input Ground truth

FCN DeconvNet EDeconvNet EDeconvNet + CRF

Page 28: #6 PyData Warsaw: Deep learning for image segmentation

DeepLab: Atrous Convolution and Fully Connected CRFs

Chen, Papandreou, Kokkinos, Murphy, Yuille “Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs”, ICLR 2015

● Conditional random field used as a post-processing step

Page 29: #6 PyData Warsaw: Deep learning for image segmentation

Conditional Random Field

Page 30: #6 PyData Warsaw: Deep learning for image segmentation

Atrous convolution

● Convolution “with holes”

● Performing convolution with larger receptive field without losing performance

Page 31: #6 PyData Warsaw: Deep learning for image segmentation

Atrous convolution

● Performing convolution on downsampled input, later upsampling the result to

original resolution

● Performing convolution with holes on originally-sized input

Page 32: #6 PyData Warsaw: Deep learning for image segmentation

Case study - clothing parsing3

Page 33: #6 PyData Warsaw: Deep learning for image segmentation

Clothing parsing

◦ Goal: detect and segment some basic clothing

categories: dresses, bags, shoes, trousers etc. on

humans

◦ We need precise clothing masks for further

processing (image search, color detection)

◦ The biggest publicly available dataset contains 7,7k

images

Page 34: #6 PyData Warsaw: Deep learning for image segmentation

ATR Dataset

◦ Images with ground-truth labels, 7.7k examples◦ 18 clothing categories◦ https://github.com/lemondan/HumanParsing-Dataset

Page 35: #6 PyData Warsaw: Deep learning for image segmentation

ATR Dataset

Page 36: #6 PyData Warsaw: Deep learning for image segmentation

Clothing parsing with general segmentation

◦ DeepLab model basing on VGG-16 architecture

◦ Both variants: with and without CRF post-processing

◦ Finetuning from VGG-16 trained on ImageNet

classification challenge

◦ Images resized to 513 x 513 resolution

◦ Training details

▫ Batch size: 8

▫ 20k iterations - 10 epochs

▫ Dataset divided into train/test in ratio = 0.9

Page 37: #6 PyData Warsaw: Deep learning for image segmentation

Clothing parsing with general segmentation: results

Input

DeepLab + CRFDeepLab

Ground truth

Page 38: #6 PyData Warsaw: Deep learning for image segmentation

Clothing parsing with general segmentation: results

DeepLab:DeepLab

+ CRF:

Ground truthInput

Page 39: #6 PyData Warsaw: Deep learning for image segmentation

Clothing parsing with general segmentation: metrics

Bags:

Dresses:

model accuracy precision recall f1-score IoU

DeepLab 0,9903 0,64 0,51 0,54 0,45

DeepLab + CRF

0,9908 0,664 0,525 0,553 0,48

model accuracy precision recall f1-score IoU

DeepLab 0,9586 0,481 0,39 0,399 0,349

DeepLab + CRF

0,9558 0,506 0,436 0,438 0,397

Page 40: #6 PyData Warsaw: Deep learning for image segmentation

Clothing parsing with detection and segmentation

● Detecting category with object detector like R-CNN, SSD, YOLO etc.

● Segmenting the object inside bounding box with models like DeepLab, DeepCut etc.

● Motivation: it’s much faster to gather bounding box level annotations than pixel-wise annotations

● Hypothesis: given correct bounding box it’s easier to segment clothing item than on whole image

Page 41: #6 PyData Warsaw: Deep learning for image segmentation

Single Shot Multibox Detector (SSD)

Wen Liu et. al,, "SSD: Single Shot Multibox Detector", 2016

Page 42: #6 PyData Warsaw: Deep learning for image segmentation

4135/360Bags train/test size

11740/ 3990Dresses train/test size

0.93Bags mAP

0.7Dresses mAP

Page 43: #6 PyData Warsaw: Deep learning for image segmentation

model accuracy precision recall f1-score IoU

DeepLab 0,9903 0,64 0,51 0,54 0,45

DeepLab + CRF

0,9908 0,664 0,525 0,553 0,48

D&S 0,993 0,765 0,709 0,731 0,64

Clothing parsing with detection and segmentation: bags metrics

Page 44: #6 PyData Warsaw: Deep learning for image segmentation

model accuracy precision recall f1-score IoU

DeepLab 0,9586 0,481 0,39 0,399 0,349

DeepLab + CRF

0,9558 0,506 0,436 0,438 0,397

D&S 0,931 0,416 0,409 0,407 0,378

Clothing parsing with detection and segmentation: dresses metrics

Page 45: #6 PyData Warsaw: Deep learning for image segmentation

Visualisations of Detection & Segmentation approach

Page 46: #6 PyData Warsaw: Deep learning for image segmentation

Visualisations of Detection & Segmentation approach

Page 47: #6 PyData Warsaw: Deep learning for image segmentation

Visualisations of Detection & Segmentation approach

Page 48: #6 PyData Warsaw: Deep learning for image segmentation

What have we used?

◦ Caffe & Python

◦ https://github.com/weiliu89/caff

e/tree/ssd

◦ https://bitbucket.org/aquariusja

y/deeplab-public-ver2

Page 49: #6 PyData Warsaw: Deep learning for image segmentation

Thanks!

Q&AYou can contact us at:

[email protected]

[email protected]