30
A Two-Pronged Defense against Adversarial Examples Dongyu Meng Hao Chen ShanghaiTech University, China University of California, Davis, USA Mag Net

A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

A Two-Pronged Defense against Adversarial Examples

Dongyu Meng Hao Chen ShanghaiTech University, China University of California, Davis, USA

Mag Net

Page 2: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

2

Neural networks in real-life applications

user authentication autonomous vehicle

Page 3: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Neural networks as classifier

3

Panda 0.62

Tiger 0.03

Gibbon 0.11

Classifier Input Output(distribution)

Page 4: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Adversarial examples

p(x is panda) = 0.58

4

p(x is gibbon) = 0.99

[ICLR 15] Goodfellow, Shlens, and Szegedy. EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES

+ xExamples carefully crafted to

- look like normal examples- cause misclassification

x

gibbon

panda

Page 5: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Attacks

Fast gradient sign method(FGSM)

Carlini’s attack

Iterative gradient sign

Deepfool

……

5

confidence

[Goodfellow, 2015]

[Carlini, 2017]

[Kurakin, 2016]

[Moosavi-Dezfooli, 2015]

Page 6: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Defenses

Adversarial training

Defensive distillation

Detecting specific attacks

……

6

[Goodfellow, 2015]

[Papernot, 2016]

[Metzen, 2017]

target specific attack

Yes

Yes

modify classifier

Yes

Yes

Page 7: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Desirable properties

7

Does not modify target classifier.

- Can be deployed more easily as an add-on.

Does not rely on attack-specific properties.

- Generalizes to unknown attacks.

Page 8: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Manifold hypothesis

8

Possible inputs take up dense sample space.

Manifold

But inputs we care about lie on a low dimensional manifold.

Page 9: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Our hypothesis for adversarial examples

9

Some adversarial examples are far away from the manifold.Classifiers are not trained to work on these inputs.

Manifold

Page 10: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Our hypothesis for adversarial examples

10

Other adversarial example are close to the manifold boundary where the classifier generalizes poorly.

Manifold

Page 11: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Sanitize your inputs.

11

Page 12: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Our solution

12 Detector: Decides if the example is far from the manifold.

Manifold

Page 13: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Our solution

13 Reformer: Draws the example towards the manifold.

Manifold

Page 14: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

14

MagNet rejects the input

MagNet returns y

Workflow

Page 15: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Autoencoder

15

- Neural nets.- Learn to copy input to output.- Trained with constraints.

Reconstruction error:

Page 16: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Autoencoder

16

Autoencoders- learn to map inputs towards manifold.- approximate input-manifold distance with reconstruction error.

Train autoencoders on normal examples only as building blocks.

Page 17: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Detector

17

-- based on reconstruction error

autoencoder

?

?

yesInput is normal.MagNet accepts the input.

no Input is adversarial.MagNet rejects the input.

||X-X’||2< threshold?

x

x’

x’

Page 18: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Detector

18

-- based on probability divergence

autoencoder

?

?

classifier

Panda ...Tiger ...Gibbon ...

classifier

Panda 0.62Tiger 0.03Gibbon 0.11

?DKL(P||Q)

P

QPanda ...Tiger ...Gibbon ...

Panda 0.62Tiger 0.03Gibbon 0.11

?

P

QDKL(P||Q) < threshold?

yesInput is normal.MagNet accepts the input.

no Input is adversarial.MagNet rejects the input.

x

x’

x’

Page 19: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Reformer

19

autoencoder

?

?

classifier

Panda ...Tiger ...Gibbon ...

Q

MagNet returns Q as final classification result.

x x’x’

Page 20: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Threat model

20

target classifier defense

blackbox defense

whitebox defense

knows the parameters of ...

Page 21: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Blackbox defense on MNIST dataset

21

accuracy on adversarial examples

Page 22: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Blackbox defense on CIFAR-10 dataset

22

accuracy on adversarial examples

Page 23: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Detector vs. reformer

23Detector and reformer complement each other.

(distortion)

large distortionmore transferable

small distortionless noticeable

complete MagNet:detector+reformer

detector

reformer

no defense

confidence

Page 24: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Whitebox defense is not practical

24

To defeat whitebox attacker, defender has to either - make it impossible for attacker to find adversarial examples,- or create a perfect classification network.

Page 25: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Graybox model

25

- Attacker knows possible defenses.- Exact defense is only known at run time.

classifier defense

blackbox defense

grayboxdefense

whiteboxdefense

Defense strategy- Train diverse defenses.- Randomly pick one for each session.

A B C D

knows the parameters of...

Page 26: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Train diverse defenses

26

With MagNet, this means training diverse autoencoders.

Our Method: Train n autoencoders at the same time.

Minimizereconstruction error

autoencoder diversity

average reconstructed image

Page 27: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Train diverse defense

27

IdeaPenalize the resemblance of autoencoders.

generate attack on

defend with

Graybox classification accuracy

Page 28: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Limitations

28

The effectiveness of MagNet depends on assumptions that- detector and reformer functions exist.- we can approximate them with autoencoders.

We show empirically that these assumptions are likely correct.

Page 29: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Conclusion

29

We propose MagNet framework:● Detector detects examples far from the manifold● Reformer moves examples closer to the manifold

We demonstrated effective defense against adversarial examples in blackbox scenario with MagNet.

Instead of whitebox model, we advocate graybox model, where security rests on model diversity.

Page 30: A Two-Pronged Defense against Adversarial Exampleshchen/paper/meng2017-slides.pdfAdversarial examples p(x is panda) = 0.58 4 p(x is gibbon) = 0.99 [ICLR 15] Goodfellow, Shlens, and

Thanks & Questions?

30

Find more about MagNet:● https://arxiv.org/abs/1705.09064 Paper● https://github.com/Trevillie/MagNet Demo code● mengdy.me Author homepage