1
Adversarial Scene Editing: Automatic Object Removal from Weak Supervision Rakshith Shetty 1,3 Mario Fritz 2,3 Bernt Schiele 1,3 1 Max Planck Institute for Informatics 2 CISPA Helmholtz Center i.G. 3 Saarland Informatics Campus Contact: [email protected] bit.ly/asecode Summary Task: Learning to remove desired object class from input images with only weak image-level labels Why: Test-case generation, data augmentation and privacy filters Image Inpainter Mask Generator Object Classifier Real/ Fake ? Is there a person ? Editor Real/ Fake Classifier Mask Discrim inator Prior/ Genrated ? Target Class Two-stage editor network to remove objects from input image The two-stage architecture with jointly trained mask generator and in-painter helps avoid adversarial perturbations. Mask generator learns by fooling the object classifier and the mask discriminator. In-painter learns by fooling the real-fake classifier. Adversarial shape prior to incorporate the knowledge of object shapes. Learns to remove objects using supervision from image labels and unpaired shape priors. Achieves removal suc- cess rate on-par with fully supervised Mask-RCNN. Qualitative results of removal COCO dataset FlickrLogo dataset person dog person tv person mini pepsi Input image Edited by our model person dog horse person horse nbc citroen Input image Edited by our model Effect of adversarial shape prior person person dog bird Input image No prior Box prior Pascal mask prior Using more accurate priors (top to bottom) leads to improved mask generation which better adheres to the object shape. Model Training the mask generator Mask Generator Mask Discrim inator to InPainter Class Label Prior / Generated ? Samples from mask prior Mask generator learns with combination of three losses, L(G M ) m = G M (x, c t ) y = (1 - m) · x + m · G I ((1 - m) · x) L(G M )= - E x [log(1 - D cls (y,c t ))] Fool object classifier - E x [ D M (m, c t )] Adversarial prior + exp( m ij ) Size regularization Adversarial shape prior offers flexibility. We experiment with randomly generated rectangles and unpaired masks. Training the in-painter In-painter learns with a combination of reconstruction loss, adversarial real-fake loss and style loss [1]. Local real-fake classifier is trained with mask as the label. L(D rf )= Real pixels Σ ij (1 - m ij )(D rf (y ) ij - 1) 2 Σ ij (1 - m ij ) + Edited pixels Σ ij m ij · (D rf (y ) ij + 1) 2 Σ ij m ij Local real-fake loss focuses discriminator to the edited region. Quantitative results and ablations Model Supervision Removal Performance Image quality metrics removal success false removal perceptual loss pSNR ssim all person GT masks - 66 72 5 0.04 27.43 0.930 Mask RCNN Segmentation masks & bounding boxes 68 73 6 0.05 25.59 0.900 Mask RCNN (dil. 7x7) 75 77 10 0.07 24.13 0.882 Ours-pascal image labels & unpaired masks 73 81 16 0.08 22.64 0.803 Quantitative comparison to supervised segmentation based removal Our weakly supervised model trained with pascal mask prior achieves on-par removal success rate as the fully supervised Mask-RCNN based object removal. Prior Removal Performance Perceptual loss Mask accuracy (mIoU ) removal success false removal all person None 94 96 36 0.13 0.15 boxes 83 88 23 0.11 0.18 pascal (10) 67 59 17 0.07 0.23 pascal (100) 70 75 16 0.07 0.22 pascal (all) 73 81 16 0.08 0.22 Better shape priors improve the mask accuracy and reduce false removal. Moving down the table from the no prior case to the box priors and then to the class specific shape priors from the Pascal dataset masks the masks smaller, improves the mIoU and also reduces the false removal rate. Input image Global loss Local loss Qualitative comparison of global vs local loss. Local real-fake loss improves the in-painting results producing sharper, texture-rich images, compared to smooth blurry results obtained by the global loss. References [1]G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” arXiv, 2018. [2] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” ICCV, 2017.

Adversarial Scene Editing: Automatic Object Removal from ... · Adversarial Scene Editing: Automatic Object Removal from Weak Supervision Rakshith Shetty1,3 Mario Fritz2,3 Bernt Schiele1,3

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Adversarial Scene Editing: Automatic Object Removal from ... · Adversarial Scene Editing: Automatic Object Removal from Weak Supervision Rakshith Shetty1,3 Mario Fritz2,3 Bernt Schiele1,3

Adversarial Scene Editing: Automatic Object Removal from Weak SupervisionRakshith Shetty1,3 Mario Fritz2,3 Bernt Schiele1,3

1Max Planck Institute for Informatics 2CISPA Helmholtz Center i.G. 3Saarland Informatics CampusContact:

[email protected]

bit.ly/asecode

Summary

Task: Learning to remove desired object class from input images withonly weak image-level labelsWhy: Test-case generation, data augmentation and privacy filters

Image In­painter

Mask Generator

ObjectClassifier

Real/Fake ?

Is there  a person ?Editor

Real/ FakeClassifierMask

Discrim­ inator Prior/

Genrated ?

TargetClass

Two-stage editor network to remove objects from input image

•The two-stage architecture with jointly trained mask generatorand in-painter helps avoid adversarial perturbations.

•Mask generator learns by fooling the object classifier and themask discriminator.

• In-painter learns by fooling the real-fake classifier.•Adversarial shape prior to incorporate the knowledge of object shapes.

Learns to remove objects using supervision from imagelabels and unpaired shape priors. Achieves removal suc-cess rate on-par with fully supervised Mask-RCNN.

Qualitative results of removal

COCO dataset FlickrLogo datasetperson dog person tv person mini pepsi

Inputimage

Editedbyour

model

person dog horse person horse nbc citroen

Inputimage

Editedbyour

model

Effect of adversarial shape prior

person person dog bird

Inputimage

Noprior

Boxprior

Pascalmaskprior

Using more accurate priors (top to bottom) leads to improved mask generationwhich better adheres to the object shape.

Model

Training the mask generator

Mask Generator

MaskDiscrim­ inator

to In­Painter

ClassLabel

Prior /Generated ? 

Samples from maskprior 

•Mask generator learns with combination of three losses, L(GM)m = GM(x, ct)y = (1−m) · x + m ·GI ((1−m) · x)

L(GM) = −Ex [log(1−Dcls(y, ct))]︸ ︷︷ ︸Fool object classifier

−Ex [DM(m, ct)]︸ ︷︷ ︸Adversarial prior

+ exp(∑ mij)︸ ︷︷ ︸Size regularization

•Adversarial shape prior offers flexibility. We experiment with randomlygenerated rectangles and unpaired masks.

Training the in-painter

• In-painter learns with a combination of reconstruction loss,adversarial real-fake loss and style loss [1].

• Local real-fake classifier is trained with mask as the label.

L(Drf) =

Real pixels︷ ︸︸ ︷Σij(1−mij)(Drf(y)ij − 1)2

Σij(1−mij)+

Edited pixels︷ ︸︸ ︷Σijmij · (Drf(y)ij + 1)2

Σijmij

• Local real-fake loss focuses discriminator to the edited region.

Quantitative results and ablations

Model SupervisionRemoval Performance Image quality metricsremoval success ↑ false ↓

removalperceptual loss ↓ pSNR ↑ ssim ↑

all personGT masks - 66 72 5 0.04 27.43 0.930Mask RCNN Segmentation masks &

bounding boxes68 73 6 0.05 25.59 0.900

Mask RCNN (dil. 7x7) 75 77 10 0.07 24.13 0.882

Ours-pascal image labels &unpaired masks 73 81 16 0.08 22.64 0.803

Quantitative comparison to supervised segmentation based removal Our weakly supervised model trained with pascal mask prior achieves on-parremoval success rate as the fully supervised Mask-RCNN based object removal.

PriorRemoval Performance Perceptual

loss ↓

Maskaccuracy(mIoU ↑)

removal success ↑ false ↓removalall person

None 94 96 36 0.13 0.15boxes 83 88 23 0.11 0.18pascal (10) 67 59 17 0.07 0.23pascal (100) 70 75 16 0.07 0.22pascal (all) 73 81 16 0.08 0.22Better shape priors improve the mask accuracy and reduce falseremoval. Moving down the table from the no prior case to the box priors andthen to the class specific shape priors from the Pascal dataset masks the maskssmaller, improves the mIoU and also reduces the false removal rate.

Input image Global loss Local lossQualitative comparison of global vs local loss. Local real-fakeloss improves the in-painting results producing sharper, texture-richimages, compared to smooth blurry results obtained by the global loss.

References[1] G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, and B. Catanzaro, “Image inpainting for irregular holes using partial convolutions,” arXiv, 2018.[2] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” ICCV, 2017.