#6 PyData Warsaw: Deep learning for image segmentation

Deep Learning for image segmentation

Michael Jamroz & Matthew Opala

AGENDA

Deep Learning methods for image segmentation

Case study - clothing parsing

Segmentation in Computer Vision

Segmentation in Computer Vision1

Computer Vision tasks

DRESS HEELS

Classification Detection Segmentation

DRESS HEELS

Semantic Segmentation

◦ Annotate each pixel◦ Doesn’t differentiate instances◦ Classic computer vision task

Instance Aware Segmentation

◦ Detect instances

◦ Annotate each pixel

◦ Simultaneous

detection and

segmentation

◦ Recent challenge in

MS-COCO

Traditional methods

Kota Yamaguchi, M Hadi Kiapour, Tamara L Berg, "Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items", ICCV 2013

● Multi-stage pipeline with image features engineered by hand (HoGs, MR8 etc.)

● Segmentation -> classification of every pixel with linear regression

Deep Learning methods for image segmentation

Convolutional neural networks

● Firstly used successfully in classification task● Three basic operations: convolution, pooling,

nonlinearity function

Semantic segmentation with CNN

CNN DRESS

Input Extract Patch Classify center pixel

Repeat for each pixel

Semantic segmentation with CNN

CNN Smaller output due to pooling

Fully Convolutional Neural Networks

Long, Shelhamer and Darrell, “Fully Convolutional Networks For Semantic Segmentation”, CVPR 2015

Fully Convolutional Neural Networks

Learnable upsampling: deconvolution

Typical 3 x 3 convolution, stride 1 pad 1

Input: 4 x 4 Output: 4 x 4

Dot productbetween filter and

Dot productbetween filter

and input

Dot productbetween filter

and input

3 x 3 “deconvolution”, stride 2 pad 1

Input gives weight for filter

Sum where output overlaps

Deconvolution Network for Semantic Segmentation

Normal VGG “Upside down” VGG

Noh, Hong and Hang, “Learning Deconvolution Network for Semantic Segmentation”, arXiv 2015

Deconvolution Network: Pooling

Pooled map

Switch Variables

Deconvolution Network: Unpooling

Pooled map

Switch Variables

DeconvNet vs. FCN

Input Ground truth

FCN DeconvNet EDeconvNet EDeconvNet + CRF

DeepLab: Atrous Convolution and Fully Connected CRFs

Chen, Papandreou, Kokkinos, Murphy, Yuille “Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs”, ICLR 2015

● Conditional random field used as a post-processing step

Conditional Random Field

Atrous convolution

● Convolution “with holes”

● Performing convolution with larger receptive field without losing performance

Atrous convolution

● Performing convolution on downsampled input, later upsampling the result to

original resolution

● Performing convolution with holes on originally-sized input

Case study - clothing parsing3

Clothing parsing

◦ Goal: detect and segment some basic clothing

categories: dresses, bags, shoes, trousers etc. on

humans

◦ We need precise clothing masks for further

processing (image search, color detection)

◦ The biggest publicly available dataset contains 7,7k

images

ATR Dataset

◦ Images with ground-truth labels, 7.7k examples◦ 18 clothing categories◦ https://github.com/lemondan/HumanParsing-Dataset

ATR Dataset

Clothing parsing with general segmentation

◦ DeepLab model basing on VGG-16 architecture

◦ Both variants: with and without CRF post-processing

◦ Finetuning from VGG-16 trained on ImageNet

classification challenge

◦ Images resized to 513 x 513 resolution

◦ Training details

▫ Batch size: 8

▫ 20k iterations - 10 epochs

▫ Dataset divided into train/test in ratio = 0.9

Clothing parsing with general segmentation: results

DeepLab + CRFDeepLab

Ground truth

Clothing parsing with general segmentation: results

DeepLab:DeepLab

+ CRF:

Ground truthInput

Clothing parsing with general segmentation: metrics

Dresses:

model accuracy precision recall f1-score IoU

DeepLab 0,9903 0,64 0,51 0,54 0,45

DeepLab + CRF

0,9908 0,664 0,525 0,553 0,48

DeepLab 0,9586 0,481 0,39 0,399 0,349

DeepLab + CRF

0,9558 0,506 0,436 0,438 0,397

Clothing parsing with detection and segmentation

● Detecting category with object detector like R-CNN, SSD, YOLO etc.

● Segmenting the object inside bounding box with models like DeepLab, DeepCut etc.

● Motivation: it’s much faster to gather bounding box level annotations than pixel-wise annotations

● Hypothesis: given correct bounding box it’s easier to segment clothing item than on whole image

Single Shot Multibox Detector (SSD)

Wen Liu et. al,, "SSD: Single Shot Multibox Detector", 2016

4135/360Bags train/test size

11740/ 3990Dresses train/test size

0.93Bags mAP

0.7Dresses mAP

DeepLab 0,9903 0,64 0,51 0,54 0,45

DeepLab + CRF

0,9908 0,664 0,525 0,553 0,48

D&S 0,993 0,765 0,709 0,731 0,64

Clothing parsing with detection and segmentation: bags metrics

DeepLab 0,9586 0,481 0,39 0,399 0,349

DeepLab + CRF

0,9558 0,506 0,436 0,438 0,397

D&S 0,931 0,416 0,409 0,407 0,378

Clothing parsing with detection and segmentation: dresses metrics

Visualisations of Detection & Segmentation approach

What have we used?

◦ Caffe & Python

◦ https://github.com/weiliu89/caff

e/tree/ssd

◦ https://bitbucket.org/aquariusja

y/deeplab-public-ver2

Thanks!

Q&AYou can contact us at:

michaljamroz@craftinity.com

mateuszopala@craftinity.com

#6 PyData Warsaw: Deep learning for image segmentation

Science

PyData Texas 2015 Keynote

PyData NYC 2015

PyData NYC 2014 talk

PyData Lonon - Finding Planets with Python

DLabs PyData MeetUp 9/03/2017

grizzly - informal overview - pydata boston 2013

Validation methods - PyData Israel

Memex - PyData Seattle

A Map of the PyData Stack

Introduction to NumPy (PyData SV 2013)

Python business intelligence (PyData 2012 talk)

Memex - Pydata, New York 2015

PyData DC 2016: A DOC Conundrum

PyData London CNN Lightning Talk

Andreas Schreiber PyData Berlin ... PyData Quantified Self.pdf · Python User since 1992 DLR.de • Chart 3 > PyData Berlin 2014 > Andreas Schreiber

PyData Amsterdam - Name Matching at Scale

Vaex talk-pydata-paris

2020 Sponsor Prospectus - PyData · 2020 Sponsor Prospectus. An educational program of. ABOUT PYDATA. PyData is an educational program of NumFOCUS, a 501(c)3 non- ... What are the

PyData London News 2nd August 2014

Extracting Knowledge from Pydata London 2015