88
Seyed-Mohsen Moosavi-Dezfooli Tehran Institute for Advanced Studies August 2019 The Achilles’ heel of deep learning

The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Seyed-Mohsen Moosavi-Dezfooli

Tehran Institute for Advanced StudiesAugust 2019

The Achilles’ heel of deep learning

Page 2: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

2

Achilles’ heel

Page 3: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

3

Esfandiyar’s eyes

Page 4: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Convolutional neural networks

Mountain?

Page 5: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Rachel Jones—Biomedical Computation Review

David Paul Morris—Bloomberg/Getty Images

Google Research

The success of deep learning

Page 6: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

(a) (b)

Figure 5: Adversarial examples generated for AlexNet [9].(Left) is a correctly predicted sample, (center) dif-ference between correct image, and image predicted incorrectly magnified by 10x (values shifted by 128 andclamped), (right) adversarial example. All images in the right column are predicted to be an “ostrich, Struthiocamelus”. Average distortion based on 64 examples is 0.006508. Plase refer to http://goo.gl/huaGPbfor full resolution images. The examples are strictly randomly chosen. There is not any postselection involved.

(a) (b)

Figure 6: Adversarial examples for QuocNet [10]. A binary car classifier was trained on top of the last layerfeatures without fine-tuning. The randomly chosen examples on the left are recognized correctly as cars, whilethe images in the middle are not recognized. The rightmost column is the magnified absolute value of thedifference between the two images.

the original training set all the time. We used weight decay, but no dropout for this network. Forcomparison, a network of this size gets to 1.6% errors when regularized by weight decay alone andcan be improved to around 1.3% by using carefully applied dropout. A subtle, but essential detailis that we only got improvements by generating adversarial examples for each layer outputs whichwere used to train all the layers above. The network was trained in an alternating fashion, maintain-ing and updating a pool of adversarial examples for each layer separately in addition to the originaltraining set. According to our initial observations, adversarial examples for the higher layers seemedto be significantly more useful than those on the input or lower layers. In our future work, we planto compare these effects in a systematic manner.

For space considerations, we just present results for a representative subset (see Table 1) of theMNIST experiments we performed. The results presented here are consistent with those on a largervariety of non-convolutional models. For MNIST, we do not have results for convolutional mod-els yet, but our first qualitative experiments with AlexNet gives us reason to believe that convolu-tional networks may behave similarly as well. Each of our models were trained with L-BFGS untilconvergence. The first three models are linear classifiers that work on the pixel level with variousweight decay parameters �. All our examples use quadratic weight decay on the connection weights:lossdecay = �

Pw2

i /k added to the total loss, where k is the number of units in the layer. Threeof our models are simple linear (softmax) classifier without hidden units (FC10(�)). One of them,FC10(1), is trained with extremely high � = 1 in order to test whether it is still possible to generateadversarial examples in this extreme setting as well.Two other models are a simple sigmoidal neuralnetwork with two hidden layers and a classifier. The last model, AE400-10, consists of a single layersparse autoencoder with sigmoid activations and 400 nodes with a Softmax classifier. This networkhas been trained until it got very high quality first layer filters and this layer was not fine-tuned. Thelast column measures the minimum average pixel level distortion necessary to reach 0% accuracy

on the training set. The distortion is measure byqP

(x0i�xi)2

n between the original x and distorted

6

+

=

School Bus

Ostrich

x

r k̂(·)

▪ Intriguing properties of neural networks, Szegedy et al., ICLR 2014.

Adversarial vulnerability

Adversarial perturbations: carefully crafted perturbations of the input data.

Page 7: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Why this problem matters

Invariance Security Understanding

Page 8: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Invariance

Mountain

ClassifierTransformed imageOriginal image

Invariance to transformations

Page 9: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Safety/Security

Deployment in hostile environments

Ice-cream

ClassifierAdversarial imageOriginal image

Page 10: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Understanding

Interpretability-related issues

Page 11: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Analysis

Evaluation(Attack) Defense

Research areas

Page 12: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

12

Evaluating the robustness properties

Page 13: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Adversarial attacks

Wx

k̂(x;W )

Training a neural network

W ⇤ = argminW

X

i

J(xi, yi;W )

Page 14: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Adversarial attacks (cont’d)

Szegedy’s method (BFGS)

Fast Gradient Sign method (FGS)

DeepFool

r⇤ = argminr

krk s.t. k̂(x+ r;W ⇤) 6= k̂(x;W ⇤)

r⇤ = ✏ sign (rxJ(x, y;W⇤))

r⇤ = argminr

J(x+ r, yt;W⇤) + Ckrk

Page 15: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

15▪Universal adversarial perturbations,

Moosavi et al., CVPR 2017.

BallonJoystick Flag pole

Face powder

Labrador

LabradorChihuahua

Chihuahua

Universal (adversarial) perturbations

Page 16: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Sparse adversarial perturbations

Bathtub “6”

Bubble Truck

Bird

“2”

Page 17: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

17

Non-additive adversarial manipulations

Bear Fox

▪Geometric robustness of deep networks,Canbak, Moosavi, Frossard, CVPR 2018.

▪Spatially transformed adversarial examples,Xiao et al., ICLR 2018.

“0” “2”

Page 18: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Adversarial patch

Banana Toaster

▪Adversarial patch, Brown et al., NIPSW 2017.

Page 19: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Adversarial patch — detection▪ Fooling automated surveillance cameras,

Thys et al., CVPR 2019.

Page 20: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

A fancy example

Page 21: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Adversarial attack for semantic segmentation▪Houdini: Fooling Deep Structured Prediction Models,

Cisse et al., arXiv 2017.

Page 22: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

22

Improving the robustness properties

Page 23: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Defense against adversarial perturbations

Projection methods

Detection methods

Regularization methods

Page 24: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Regularizationmethods — implicit

Robust optimisation (a.k.a. adversarial training)

x

x+ r

Adversarial perturbations

Image batch

Training

Page 25: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Regularizationmethods — explicit

Maximum margin classification

∆f(x) > 0

f(x) < 0

� ⇡ |f(x)|krf(x)k2

Page 26: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Detection-based methods

Adversarial examples are out-of-distribution samples.

Detector

!Ice-cream

Page 27: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Projection-based methods

Adversarial examples are out-of-distribution samples.

Mountain

Ice-cream

Projection

argmin{x0:9z,x0=g(z)}

d(x, x0)

Page 28: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

28

Analysis of adversarial vulnerability

Page 29: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Two hypotheses

Deep classifiers are “too linear”.

▪ Intriguing properties of neural networks,Szegedy et al., ICLR 2014.

▪Explaining and harnessing adversarial examples,Goodfellow et al., ICLR 2015.

Adversarial examples are “blind spots”.

Page 30: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Robustness vs accuracy

R(f)

ρadv(f)

Not achievable bylinear classifiers

0

There is a trade-off between robustness and accuracy for linear classifiers.

▪Analysis of classifiers' robustness to adversarial perturbations,Fawzi et al., Machine Learning 2018.

Page 31: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Robustness vs accuracy (cont’d)

There seems to be a trade-off between robustness and accuracy for deep nets.

▪Robustness May Be at Odds with Accuracy,Tsipras et al., ICLR 2019.

Page 32: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Robustness vs accuracy (cont’d)

-2 -1 0 1 2 -2 -1 0 1 2 3-3

Random Direction Adversarial Direction

-2

-1

0

1

2

-2

-1

0

1

2

-3

3

Rand

om D

irec

tion

Rand

om D

irec

tion

max margin boundarydata manifoldclassified as inner sphere

max margin boundarydata manifoldclassified as inner sphere

Adversarial vulnerability is linked to test error.

▪Adversarial spheres, Gilmer et al., ICLR 2018.

Page 33: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Impossibility results

▪Adversarial vulnerability for any classifier,Fawzi et al., NeurIPS 2018.

There exists fundamental (classifier-independent) limits on achievable robustness.

Page 34: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Adversarial perturbations are features

▪With Friends Like These, Who Needs Adversaries?,Jetley et al., NeurIPS 2018.

Adversarial perturbations can be attributed to discriminative features in data.

▪Adversarial Examples Are Not Bugs, They Are Features,Ilyas et al., arXiv 2019.

Page 35: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

35

A geometric perspective on the robustness of deep networks

Page 36: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Omar Fawzi ENS-Lyon

Stefano Soatto UCLA

Pascal Frossard EPFL

Alhussein Fawzi Google DeepMind

Jonathan Uesato Google DeepMind

Can Kanbak Bilkent

Apostolos Modas EPFL

Collaborators

Page 37: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

37

“Geometry is not true, it is advantageous.”

Henri Poincaré

Page 38: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Adversarial perturbations How large is the “space” of adversarial examples?

38

Universal perturbations What causes the vulnerability of deep networks to universal perturbations?

Geometry of …

Adversarial training What geometric features contribute to a better robustness properties?

Page 39: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

39

Geometry of adversarial perturbations

Page 40: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

r⇤ = argminr

krk2 s.t. k̂(x+ r) 6= k̂(x)

x+ r⇤x0

x 2 Rd

Geometric interpretation of adversarial perturbations

Page 41: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

DeepFool

r(1) x

x(1)

▪DeepFool, Moosavi et al., CVPR 2016.

A simple and fast method to reach the boundary.

Page 42: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

DeepFool

r(1) x

x(1)

▪DeepFool, Moosavi et al., CVPR 2016.

A simple and fast method to reach the boundary.

r(2)

x

x(1)x(2)

Page 43: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

DeepFool (cont’d)

DeepFool’s performance in independent benchmarks:

▪Robust Vision Benchmark, Bethge’s Lab.

Page 44: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

▪Robustness of classifiers:from adversarial to random noise,Fawzi, Moosavi, Frossard, NIPS 2016.

U

TxBxv

r

Normal cross-sections of decision boundary

Page 45: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

▪Robustness of classifiers:from adversarial to random noise,Fawzi, Moosavi, Frossard, NIPS 2016.

Curvature of decision boundary of deep nets

-100 -50 0 50 100 150

-2

-1

0

1

2

x

B 2

B 1

Decision boundary of CNNs is almost flat along random directions.

Page 46: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

rS(x) = argminr2S

krk s.t. k̂(x+ r) 6= k̂(x)

rS(x) = ⇥

rd

mr(x)

!

Adversarial perturbations constrained to a random subspace of dimension m.

Space of adversarial perturbations

For low curvature classifiers, w.h.p., we have

r∗

S

x∗

r∗S

x

Page 47: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Flowerpot Pineapple

+ =

Structured additive perturbations

▪Robustness of classifiers:from adversarial to random noise,Fawzi, Moosavi, Frossard, NIPS 2016.

The “space” of adversarial examples is quite vast.

Page 48: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Sparse Perturbations

Page 49: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

SparseFool▪SparseFool: a few pixels make a big difference,

Modas, Moosavi, Frossard, CVPR 2019.

Finding a “valid” sparse adversarial perturbation.

argminr

krk0

s.t. k̂(x+ r) 6= k̂(x)

l � x+ r � u

Page 50: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

SparseFool▪SparseFool: a few pixels make a big difference,

Modas, Moosavi, Frossard, CVPR 2019.

Approximating the decision boundary with a hyperplane.

Page 51: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

SparseFool▪SparseFool: a few pixels make a big difference,

Modas, Moosavi, Frossard, CVPR 2019.

Solving for the approximated classifier.

argminr

krk1

s.t. w>(x+ r � xB) = 0

l � x+ r � u

Page 52: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

SparseFool▪SparseFool: a few pixels make a big difference,

Modas, Moosavi, Frossard, CVPR 2019.

Iterate!

Page 53: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

SparseFool — visual results

▪SparseFool: a few pixels make a big difference,Modas, Moosavi, Frossard, CVPR 2019.

Bathtub

Bubble

Cockroach Palace

Sandal Bottle

Page 54: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Black-box attacks

Adversarial examples without direct access to the classifier’s weights.

Ice-cream

ClassifierAdversarial imageOriginal image

Page 55: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Black-box attacks

Adversarial examples without direct access to the classifier’s weights.

Ice-cream

ClassifierAdversarial imageOriginal image

Page 56: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

QFool▪A geometry-inspired decision-based attack,

Liu, Moosavi, Frossard, ICCV 2019.

Page 57: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

QFool▪A geometry-inspired decision-based attack,

Liu, Moosavi, Frossard, ICCV 2019.

ξ

η1, ...,ηn

Page 58: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

QFool▪A geometry-inspired decision-based attack,

Liu, Moosavi, Frossard, ICCV 2019.

ξ

η1, ...,ηn

xadv

Page 59: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Summary

Geometry of adversarial examples Decision boundary is “locally” almost flat. Datapoints lie close to the decision boundary.

Flatness can be used to construct diverse set of perturbations. design efficient attacks.

Page 60: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

60

Geometry of universal perturbations

Page 61: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

61▪Universal adversarial perturbations,

Moosavi et al., CVPR 2017.

BallonJoystick Flag pole

Face powder

Labrador

LabradorChihuahua

Chihuahua

Universal adversarial perturbations (UAP)

85%

Page 62: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Diversity of UAPs

62

CaffeNet

GoogLeNetResNet-152

VGG-16VGG-19 VGG-F

Diversity of perturbations

Page 63: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

63

Why do universal perturbations exist?

Flat model

Curved model

Page 64: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

64Flat model▪Robustness of classifiers to universal perturbations, Moosavi et al., ICLR 2018.

Page 65: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

65

Flat model (cont’d)▪Robustness of classifiers to universal perturbations,

Moosavi et al., ICLR 2018.

1 50’000

Plot of singular values

Random vectors

Normals of the decision boundary

Normals to the decision boundary are “globally” correlated.

Page 66: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

66

Flat model (cont’d)▪Robustness of classifiers to universal perturbations,

Moosavi et al., ICLR 2018.

The flat model only partially explains the universality.

13% 38% 85%

RandomUAP

(greedy algorithm)

Page 67: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

67Curved model▪Robustness of classifiers to universal perturbations, Moosavi et al., ICLR 2018.

The principal curvatures of the decision boundary:

0

0.0

30002500200015001000500

Page 68: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

68

Curved model (cont’d)▪Robustness of classifiers to universal perturbations,

Moosavi et al., ICLR 2018.

The principal curvatures of the decision boundary:

0

0.0

30002500200015001000500

v

n

x

Page 69: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

69

Curved model (cont’d)▪Robustness of classifiers to universal perturbations,

Moosavi et al., ICLR 2018.

The principal curvatures of the decision boundary:

0

0.0

30002500200015001000500

x v

n

Page 70: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

70

Curved model (cont’d)▪Robustness of classifiers to universal perturbations,

Moosavi et al., ICLR 2018.

The principal curvatures of the decision boundary:

0

0.0

30002500200015001000500

x v

n

Page 71: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

71

Curved directions are shared▪Robustness of classifiers to universal perturbations,

Moosavi et al., ICLR 2018.

Normal sections of the decision boundary (for different datapoints) along a single direction:

UAPdirection

Random direction

Page 72: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

72

Curved directions are shared (cont’d)▪Robustness of classifiers to universal perturbations,

Moosavi et al., ICLR 2018.

The curved model better explains the existence of universal perturbations.

67%13% 38% 85%

UAPRandomCurved model

Flat model

Page 73: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

73Summary

Universality of perturbations Shared curved directions explain this vulnerability.

A possible solution Regularizing the geometry to combat against universal perturbations.

Why are deep nets curved? ▪ With friends like these, who needs adversaries?,

Jetley et al., NeurIPS 2018.

Page 74: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

74

Geometry of adversarial training

Page 75: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

In a nutshell

x

x+ r

Adversarial perturbations

Image batch

Training

Adversarial training

Curvature regularization

Page 76: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Adversarial training

x

x+ r

Adversarial perturbations

Image batch

Training

One of the most effective methods to improve adversarial robustness…

▪Obfuscated gradients give a false sense of security, Athalye et al., ICML 2018. (Best paper)

Page 77: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Gradient masking

Most defense methods give a false sense of security.

rf(x) ⌘ 0

∇f(x)x

xadv

Page 78: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Geometry of adversarial training

Curvature profiles of normally and adversarially trained networks:

0

0.0

30002500200015001000500

Normal Adversarial

▪Robustness via curvature regularisation, and vice versa, Moosavi et al., CVPR 2019.

Page 79: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Curvature Regularization (CURE) ▪Robustness via curvature regularisation, and vice versa,

Moosavi et al., CVPR 2019.

94.9%

Normal training

81.2%

CURE

79.4%

Adversarial training

0.0% 36.3% 43.7%PGD with

Clean

kr⇤k1 = 8

Page 80: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

AT vs CURE

AT CURE

Implicit regularization

Time consuming

SOTA robustness

Explicit regularization

3x to 5x faster

On par with SOTA

▪Robustness via curvature regularisation, and vice versa, Moosavi et al., CVPR 2019.

Page 81: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Validity of CURE

-6

-6

-4

-2

0

2

4

6

-4 -2 0Adv. loss using PGD

SPSA vs PGD comparison

Adv.

loss

usi

ng S

PSA

2 4 6

▪Adversarial risk and the dangers of evaluating against weak attacks, Uesato et al., ICML 2018.

Page 82: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Summary

Inherently more robust classifiers Curvature regularization can significantly improve the robustness properties.

Counter-intuitive observation Due to a more linear nature, an adversarially trained net is “easier” to fool.

A better trade-off? ▪ Adversarial Robustness through Local Linearization,

Qin et al., arXiv.

Page 83: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

83

Future challenges

Page 84: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

Architectures Batch-norm, dropout, depth, width, etc.

84

Data # of modes, convexity, distinguishability, etc.

Disentangling different factors

Training Batch size, solver, learning rate, etc.

Page 85: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

85

Beyond additive perturbations

Bear Fox

▪Geometric robustness of deep networks,Canbak, Moosavi, Frossard, CVPR 2018.

▪Spatially transformed adversarial examples,Xiao et al., ICLR 2018.

“0” “2”

Page 86: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

86

“Interpretability” and robustness

Original image

Standard training

Adversarial training

▪Robustness may be at odds with accuracy, Tsipras et al., NeurIPS 2018.

Page 87: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

87

“Interpretability” and robustness

Original image

Standard training

Adversarial training

▪Robustness may be at odds with accuracy, Tsipras et al., NeurIPS 2018.

Airplane

Dog Deer

Bird Bird

Deer

Page 88: The Achilles’ heel of deep learning · Explaining and harnessing adversarial examples, Goodfellow et al., ICLR 2015. Adversarial examples are “blind spots”. Robustness vs accuracy

88

Interested in my research?

[email protected]

smoosavi.me