Download pdf - Semidefinite relaxations for certifying robustness to adversarial … · 2019. 1. 13. · Semidefinite relaxations for certifying robustness to adversarial examples Jacob Steinhardt

Semidefinite relaxations for certifying robustness to

adversarial examples

Jacob Steinhardt Percy Liang

Aditi Raghunathan

2

ML: Powerful But Fragile

2

ML: Powerful But Fragile• ML is successful on several tasks: object recognition, game playing, face

recognition

2


recognition

• ML systems fail catastrophically in presence of adversaries

2


recognition


2


recognition


2

• Different kinds of adversarial manipulations — data poisoning, manipulation of test inputs, model theft, membership inference etc.


recognition


2

• Different kinds of adversarial manipulations — data poisoning, manipulation of test inputs, model theft, membership inference etc.

• Focus on adversarial examples — manipulation of test inputs

Adversarial Examples

3


[Sharif et al. 2016]

Glasses ! Impersonation

3



Glasses ! Impersonation Banana + patch !Toaster

[Brown et al. 2017]

3



Glasses ! Impersonation Banana + patch !Toaster Stop + sticker !Yield

[Brown et al. 2017] [Evtimov et al. 2017]

3


4


3D Turtle ! Rifle

[Athalye et al. 2017]

4


3D Turtle ! Rifle


Noise ! “Ok Google”

[Carlini et al. 2017]

4


3D Turtle ! Rifle


Noise ! “Ok Google”

[Carlini et al. 2017]

Malware ! Benign

[Grosse et al. 2017]

4

5

What is an adversarial example?

Definition of attack model usually application specific and complex

5



We consider the well studied attack model `1

5




5


Panda Gibbon

Szegedy et al. 2014



5


|xadv � x|i ✏ for i = 1, 2, . . . d

Panda Gibbon

Szegedy et al. 2014



5


|xadv � x|i ✏ for i = 1, 2, . . . d xadv 2 B✏(x)

Panda Gibbon

Szegedy et al. 2014

History

6

History

6

Hard to defend even in this well defined model…

History• [Szegedy+ 2014]: First discover adversarial examples

6



• [Goodfellow+ 2015]: Adversarial training (AT) against FGSM

6




• [Papernot+ 2015]: Defensive Distillation

6





• [Carlini & Wagner 2016]: Distillation is not secure

6






• [Papernot + 2017]: Better distillation

6







• [Carlini & Wagner 2017]: Ten detection strategies fail

6








• [Madry+ 2017]: AT against PGD, informal argument about optimality

6









• [Lu + July 12 2017]: ”NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles”

6










• [Athalye and Sutskever July 17 2017]: Break above defense

6











• [Athalye, Carlini, Wagner]: Break 6 out of 7 ICLR defenses 6











• [Athalye, Carlini, Wagner]: Break 6 out of 7 ICLR defenses 6


Vs.

7

Provable robustness

7

Provable robustness

7

Can we get robustness to all attacks?

Provable robustness

7


f(x)Let be the scoring function and adversary wants to maximizef(x)

Provable robustness

7



Provable robustness

7


Attacks: Generate points in B✏(x)


Provable robustness

7



Afgsm(x) = x+ ✏ sign�rf(x)

�


Provable robustness

7




�


Aopt(x) = argmaxx

f(x)

Provable robustness

7




�

Network is provably robust if optimal attack fails


Aopt(x) = argmaxx

f(x)

Provable robustness

7




�

Network is provably robust if optimal attack fails


Aopt(x) = argmaxx

f(x)

f? ⌘ f(Aopt(x)) < 0

8

8

Network is provably robust if

Provable robustnessf? ⌘ f(Aopt(x)) < 0

8


Provable robustnessf? ⌘ f(Aopt(x)) < 0

Computing is intractable in generalf?

8


• Combinatorial approaches to compute

Provable robustness

f?

f? ⌘ f(Aopt(x)) < 0


8



• SMT based Reluplex [Katz+ 2018]

Provable robustness

f?

f? ⌘ f(Aopt(x)) < 0


8




• MILP based with specialized preprocessing [Tjeng+ 2018]

Provable robustness

f?

f? ⌘ f(Aopt(x)) < 0


8





• Convex relaxations to compute upper bound on

Provable robustness

f?

f?

f? ⌘ f(Aopt(x)) < 0


8






• Upper bound is negative optimal attack fails =)

Provable robustness

f?

f?

f? ⌘ f(Aopt(x)) < 0


8






• Upper bound is negative optimal attack fails

• Computationally efficient upper bound =)

Provable robustness

f?

f?

f? ⌘ f(Aopt(x)) < 0


8








Provable robustness

f?

f?

0 f?

Not robust

fupper

f? ⌘ f(Aopt(x)) < 0


8








Provable robustness

f?

f?

0 f?

Not robust

fupper 0f? fupper

Robust and certified

f? ⌘ f(Aopt(x)) < 0


8








Provable robustness

f?

f?

0 f?

Not robust

fupper 0f? fupper

Robust and certified

0f? fupper

Robust and not certified

f? ⌘ f(Aopt(x)) < 0


9

Two layer networks

9

Two layer networks

9

Two layer networks

9

Two layer networks

9

Key idea: Uniformly bound gradients

Two layer networks

f(x) f(x) + ✏maxx

krf(x)k19


Two layer networks

10

Two layer networks

10

Two layer networks

f(x) f(x) + ✏maxx

krf(x)k1

10


Two layer networks

Bound on gradient:

f(x) f(x) + ✏maxx

krf(x)k1

10


Two layer networks

Bound on gradient:

f(x) f(x) + ✏maxx

krf(x)k1

10


Two layer networks

Bound on gradient:

f(x) f(x) + ✏maxx

krf(x)k1

10


Two layer networks

Bound on gradient:

optimize over activations

f(x) f(x) + ✏maxx

krf(x)k1

10


Two layer networks

Bound on gradient:


optimize over signs of perturbation

f(x) f(x) + ✏maxx

krf(x)k1

10


Two layer networks

Bound on gradient:


optimize over signs of perturbation

Final step: SDP relaxation (similar to MAXCUT) leads to Grad-cert

f(x) f(x) + ✏maxx

krf(x)k1

10


Relaxation Training

11

Relaxation TrainingTraining a neural network

11


Objective:

11


Objective:

11


Objective:

Differentiable objective but expensive gradients

11


Objective:


Duality to the rescue!

11


Objective:



Regularizer:

11

D · �+max

�(M(v,W )� diag(c)

�+ 1> max(c, 0)


Objective:



Regularizer:

Just one max eigenvalue computation for gradients

11

D · �+max

�(M(v,W )� diag(c)

�+ 1> max(c, 0)

Results on MNIST

12

Results on MNIST

Attack: Projected Gradient Descent attack of Madry et al. 2018 Adversarial training: Minimizes this lower bound on training set

12

Results on MNIST


12

Results on MNIST

13

Results on MNIST


13

Results on MNIST


13

Gradient based bound is quite loose

Results on MNIST

14

Train with Grad-cert(attack)

Results on MNIST

Attack: Projected Gradient Descent attack of Madry et al. 2018 Our method: Minimize gradient based upper bound 14

Train with Grad-cert(attack)

Results on MNIST

15

Train with Grad-cert(attack)Train with Grad-cert(Certified)

Results on MNIST



Results on MNIST



Gradient based bound is much better!

Results on MNIST

16

Results on MNIST

16

Training a network to minimize gradient upper bound finds networks where the bound is tight

Results on MNIST

16


Comparison with Wong and Kolter 2018 (LP-cert)

Results on MNIST

16



Network PGD-attack LP-cert Grad-cert

LP-NN 22% 26% 93%Grad-NN 15% 97% 35%

Results on MNIST

16



Bounds are tight when you train


LP-NN 22% 26% 93%Grad-NN 15% 97% 35%

Results on MNIST

17

Results on MNISTTraining a network to minimize gradient upper bound

finds networks where the bound is tight




LP-NN 22% 26% 93%Grad-NN 15% 97% 35%

17




Bounds are tight when you train Bounds are tight only when you train


LP-NN 22% 26% 93%Grad-NN 15% 97% 35%

17





Some networks are empirically robust but not certified (e.g. Adversarial Training of Madry et al. 2018)

Bounds are tight only when you train


LP-NN 22% 26% 93%Grad-NN 15% 97% 35%

17





Some networks are empirically robust but not certified (e.g. Adversarial Training of Madry et al. 2018)

Can we certify such “foreign” networks?

Bounds are tight only when you train


LP-NN 22% 26% 93%Grad-NN 15% 97% 35%

17

18

Summary so far…

18

Summary so far…

• Certified robustness: relaxed optimization to bound worst-case attack

18

Summary so far…


• Grad-cert: Upper bound on worst case attack using uniform bound on

gradient

18

Summary so far…



gradient

• Training against the bound makes it tight

18

Summary so far…



gradient


• LP-cert and Grad-cert are tight only on training

18

Summary so far…



gradient


• LP-cert and Grad-cert are tight only on training

• Goal: Efficiently certify foreign multi-layer networks

18

19

19

New SDP-cert relaxation

19


19


19

New SDP-cert relaxationx x1 x2 x3 ⌘ xL

19


Attack model constraints:

x x1 x2 x3 ⌘ xL

19


Attack model constraints:|x� x|i ✏

for i = 1, 2, . . . d

x x1 x2 x3 ⌘ xL

19



for i = 1, 2, . . . d

x x1 x2

for i = 1, 2, . . . Lxi = ReLU(Wi�1xi�1)

x3 ⌘ xL

Neural net constraintsW2W1W0

19



for i = 1, 2, . . . d

x x1 x2


Objective

x3 ⌘ xL


19


cy

cy


for i = 1, 2, . . . d

x x1 x2


Objective

x3 ⌘ xL


19


cy

cy


for i = 1, 2, . . . d

x x1 x2


Objective

x3 ⌘ xL

Neural net constraints

f? = maxx

(cy � cy)>xL

W2W1W0

19

Source of non-convexity is the ReLU constraints


cy

cy


for i = 1, 2, . . . d

x x1 x2


Objective

x3 ⌘ xL

Neural net constraints

f? = maxx

(cy � cy)>xL

W2W1W0

20

Handling ReLU constraints

20

Handling ReLU constraintsConsider single ReLU constraint z = max(0, x)

20

Handling ReLU constraintsConsider single ReLU constraint

Key insight: Can be replaced by linear + quadratic constraints

z = max(0, x)

20



z = max(0, x)

z � x Linear

20



z = max(0, x)

z � 0

z � x

Linear

Linear

20



z = max(0, x)

z � 0

z � x

Linear

Linear

20

{is greater than z x, 0



z = max(0, x)

z � 0

z � x

z(z � x) = 0 Quadratic

Linear

Linear

20




z = max(0, x)

z � 0

z � x


Linear

Linear

20


z equal to one of x, 0



z = max(0, x)

z � 0

z � x


Linear

Linear

20





z = max(0, x)

z � 0

z � x


Linear

Linear

20



Can relax quadratic constraints to get a semidefinite program

21

SDP relaxation

21

SDP relaxation

21

Single ReLU constraint ⌘ Linear + Quadratic constraintsz = max(0, x)

SDP relaxation

21


M =

2

41 x zx x2 xzz xz z2

3

5

SDP relaxation

21


M =

2


3

5 z � x

SDP relaxation

21


M =

2


3

5 z � x

SDP relaxation

21


M =

2


3

5 z � 0

SDP relaxation

21


M =

2


3

5 z(z � x) = 0

z2 = xz

SDP relaxation

21


M =

2


3

5

SDP relaxation

21


M =

2


3

5 ReLU constraints as linear constraints on matrix entries

SDP relaxation

21


M =

2


3

5

Constraint on M

ReLU constraints as linear constraints on matrix entries

SDP relaxation

21


M =

2


3

5

Constraint on M

M = vv> Exact but non-convex set


SDP relaxation

21


M =

2


3

5

Constraint on M


M = V V > Relaxed and convex set


SDP relaxation

21


M =

2


3

5

Constraint on M




M ⌫ 0

SDP relaxation

21


M =

2


3

5

Constraint on M




M ⌫ 0

Generalizes to multiple layers: large matrix M with all activations

SDP relaxation

22

SDP relaxationInteraction between different hidden units

22


x1, x2 2 [�✏, ✏]

22


x1, x2 2 [�✏, ✏]

22

z1 = ReLU(x1 + x2)

z2 = ReLU(x1 � x2)


x1, x2 2 [�✏, ✏]

x1 = x2 = 0.5✏

22

z1 = ReLU(x1 + x2)

z2 = ReLU(x1 � x2)


x1, x2 2 [�✏, ✏]

x1 = x2 = 0.5✏

Unrelaxed value

22

z1 = ReLU(x1 + x2)

z2 = ReLU(x1 � x2)


x1, x2 2 [�✏, ✏]

x1 = x2 = 0.5✏

Unrelaxed value

LP treats units independently SDP reasons jointly

22

z1 = ReLU(x1 + x2)

z2 = ReLU(x1 � x2)


x1, x2 2 [�✏, ✏]

x1 = x2 = 0.5✏

Unrelaxed value

LP treats units independently SDP reasons jointly

Theorem: For a random two layer network with hidden nodes and input dimension , opt(LP) = and opt(SDP) =

md ⇥(md) ⇥(m

pd+ d

pm)

22

z1 = ReLU(x1 + x2)

z2 = ReLU(x1 � x2)

23

Results on MNIST

23

Results on MNIST

23

Three different robust networks

Results on MNIST

23


Grad-NN [Raghunathan et al. 2018]

Results on MNIST

23



LP-NN [Wong and Kolter 2018]

Results on MNIST

23




PGD-NN [Madry et al. 2018]

Results on MNIST

23





Grad-NN LP-NN PGD-NN

Grad-cert 35% 93% N/A

LP-cert 97% 22% 100%

SDP-cert 20% 20% 18%

PGD-attack 15% 18% 9%

Results on MNIST

23





Grad-NN LP-NN PGD-NN

Grad-cert 35% 93% N/A

LP-cert 97% 22% 100%

SDP-cert 20% 20% 18%

PGD-attack 15% 18% 9%

SDP provides good certificates on all three different networks

Results on MNIST

24

Results on MNIST

24


Results on MNIST

24


Results on MNIST

24


Uncertified points are more vulnerable to attack

Scaling up…

25

Scaling up…In general, CNNs are more robust than fully connected networks

25


Off-the-shelf SDP solvers do not exploit the CNN structure

25



Ongoing work:

25



Ongoing work:

25



Ongoing work:

First order matrix-vector product based SDP solvers

25



Ongoing work:


25



Ongoing work:


Exploit efficient CNN implementations in Tensorflow

25



Ongoing work:



Concurrent work:

25



Ongoing work:



Concurrent work:

25



Ongoing work:



Concurrent work:

MILP solving with efficient preprocessing [Tjeng+ 2018, Xiao+ 2018]

25



Ongoing work:



Concurrent work:


25



Ongoing work:



Concurrent work:


Scaling up LP based methods [Dvijotham+ 2018, Wong and Kolter 2018]25



Ongoing work:



Concurrent work:


Scaling up LP based methods [Dvijotham+ 2018, Wong and Kolter 2018]25

26

Summary

26

Summary • Robustness for attack model

26

`1


• Certified evaluation to avoid arms race

26

`1



• Presented two different relaxations for certification

26

`1




• Adversarial examples more broadly..

26

`1





• Does there exist a mathematically well defined attack model?

26

`1






• Would the current techniques (deep learning + appropriate

regularization) transfer to this attack model?

26

`1








• Secure vs. better models?

26

`1









• Adversarial examples expose limitations of current systems

26

`1









• Adversarial examples expose limitations of current systems

• How do we get models to learn “the right thing”?

26

`1

Thank you!

Jacob Steinhardt Percy Liang

“Certified Defenses against Adversarial Examples” https://arxiv.org/abs/1801.09344 [ICLR 2018]

“Semidefinite Relaxations for Certifying Robustness to Adversarial Examples” https://arxiv.org/abs/1811.01057 [NeurIPS 2018]

Open Philanthropy

Project

Google

https://arxiv.org/abs/1801.09344

https://arxiv.org/abs/1811.01057