Toward Adversarial Learning as ControlToward Adversarial Learning as Control Jerry Zhu University of...

Toward Adversarial Learning as Control

Jerry Zhu

University of Wisconsin-Madison

The 2nd ARO/IARPA Workshop on Adversarial MachineLearning

May 9, 2018

Test time attacks

I Given classifier f : X 7→ Y, x ∈ XI Attacker finds x′ ∈ X :

minx′

‖x′ − x‖

s.t. f(x′) 6= f(x).

“Large margin” defense against test time attacks

I Defender finds f ′ ∈ F :

minf ′

‖f ′ − f‖

s.t. f ′(x′) = f(x),∀ training x,∀x′ ∈ Ball(x, ε).

Heuristic implementation of large margin defense

Repeat:

I (x, x′)← OracleAttacker(f)

I Add (x′, f(x)) to (X,Y )

I f ← A(X,Y )

Training set poisoning attacks

I Given learner A : (X × Y)∗ 7→ F , data (X,Y ), goalΦ : F 7→ bool

I Attacker finds poisoned data (X ′, Y ′)

min(X′,Y ′),f

‖(X ′, Y ′)− (X,Y )‖

s.t. f = A(X ′, Y ′)

Φ(f) = true.

Training set poisoning attacks

I Given learner A : (X × Y)∗ 7→ F , data (X,Y ), goalΦ : F 7→ bool

I Attacker finds poisoned data (X ′, Y ′)

min(X′,Y ′),f

‖(X ′, Y ′)− (X,Y )‖

s.t. f = A(X ′, Y ′)

Φ(f) = true.

defense = poisoning = machine teaching

[An Overview of Machine Teaching. ArXiv 1801.05927, 2018]

defense = poisoning = machine teaching = control

[An Overview of Machine Teaching. ArXiv 1801.05927, 2018]

Attacking a sequential learner A =SGD

Learner A (plant):

I starts at w0 ∈ Rd

I wt ← wt−1 − η∇`(wt−1, xt, yt)

Attacker:

I designs (x1, y1) . . . (xT , yT ) (control signal)

I wants to drive wT to some w∗

I optionally minimizes T

Attacking a sequential learner A =SGD

Learner A (plant):

I starts at w0 ∈ Rd

I wt ← wt−1 − η∇`(wt−1, xt, yt)

Attacker:

I designs (x1, y1) . . . (xT , yT ) (control signal)

I wants to drive wT to some w∗

I optionally minimizes T

Nonlinear discrete-time optimal control

...even for simple linear regression:

`(w, x, y) =1

2(x>w − y)2

wt ← wt−1 − η(x>t wt−1 − yt)xt

Continuous version:

w(t) = (y(t)− w(t)>x(t))x(t)

‖x(t)‖ ≤ 1, |y(t)| ≤ 1,∀t

Attack goal is to drive w(t) from w0 to w∗ in minimum time.

Nonlinear discrete-time optimal control

...even for simple linear regression:

`(w, x, y) =1

2(x>w − y)2

wt ← wt−1 − η(x>t wt−1 − yt)xtContinuous version:

w(t) = (y(t)− w(t)>x(t))x(t)

‖x(t)‖ ≤ 1, |y(t)| ≤ 1,∀t

Attack goal is to drive w(t) from w0 to w∗ in minimum time.

Greedy heuristic

minxt,yt,wt

‖wt − w∗‖

s.t. ‖xt‖ ≤ 1, |yt| ≤ 1

wt = wt−1 − η(x>t wt−1 − yt)xt

... or further constrain xt in the direction w∗ − wt−1

[Liu, Dai, Humayun, Tay, Yu, Smith, Rehg, Song. ICML’17]

Discrete-time optimal control

minx1:T ,y1:T ,w1:T

s.t. ‖xt‖ ≤ 1, |yt| ≤ 1, t = 1 . . . T

wt = wt−1 − η(x>t wt−1 − yt)xt, t = 1 . . . T

wT = w∗.

Controlling SGD squared lossT = 2 (DTOC) vs. T = 3 (greedy)

w0 = (0, 1, 0), w∗ = (1, 0, 0), ‖x‖ ≤ 1, |y| ≤ 1, η = 0.55

Controlling SGD squared loss (2)T = 37 (DTOC) vs. T = 55 (greedy)

w0 = (0, 5), w∗ = (1, 0), ‖x‖ ≤ 1, |y| ≤ 1, η = 0.05

Controlling SGD logistic lossT = 2 (DTOC) vs. T = 3 (greedy)

w0 = (0, 1), w∗ = (1, 0), ‖x‖ ≤ 1, |y| ≤ 1, η = 1.25

Controlling SGD hinge lossT = 2 (DTOC) vs. T = 16 (greedy)

w0 = (0, 1), w∗ = (1, 0), ‖x‖ ≤ 100, |y| ≤ 1, η = 0.01

Detoxifying a poisoned training set

I Given poisoned (X ′, Y ′), a small trusted (X, Y )

I Estimate detox (X,Y ):

min(X,Y ),f

‖(X,Y )− (X ′, Y ′)‖

s.t. f = A(X,Y )

f(X) = Y

f(X) = Y.

Detoxifying a poisoned training set

[Zhang, Zhu, Wright. AAAI 2018]

Training set camouflage: Attack on perceived intention

Alice → Eve → Bob

Too obvious.

f = A ( )

Alice f → Eve → Bob

Too suspicious.

Alice → Eve → Bob

I Less suspicious to Eve

I Bob learns f ′ = A( )

I f ′ good at man vs. woman! f ′ ≈ f .

Alice’s camouflage problem

Given:

I sensitive data S (e.g. man vs. woman)

I public data P (e.g. the whole MNIST 1’s and 7’s)

I Eve’s detection function Φ (e.g. two-sample test)

I Bob’s learning algorithm A and loss `

Find D:

minD⊆P

∑(x,y)∈S

`(A(D), x, y)

s.t. Φ thinks D,P from the same distribution.

Alice’s camouflage problem

Given:

I sensitive data S (e.g. man vs. woman)

I public data P (e.g. the whole MNIST 1’s and 7’s)

I Eve’s detection function Φ (e.g. two-sample test)

I Bob’s learning algorithm A and loss `

Find D:

minD⊆P

∑(x,y)∈S

`(A(D), x, y)

s.t. Φ thinks D,P from the same distribution.

Camouflage examples

Sample of Sensitive Set Sample of Camouflaged Training SetClass Article Class Article

Christianity . . .Christ that often causes Baseball . . .The Angels won theircritical of themselves . . . Brewers today before 33,000+ . . .. . .I’ve heard it said . . . interested in finding outof Christs life and ministry . . . to get two tickets . . .

Atheism . . .This article attempts to Hockey . . . user and not necessarilyintroduction to atheism. . . the game summary for. . .. . .Science is wonderful . . .Tuesday, and the isles/capsto question scientific. . . what does ESPN do. . .

Attack on stochastic multi-armed bandit

K-armed bandit

I ad placement, news recommendation, medical treatment . . .

I suboptimal arm pulled o(T ) times

Attack goal:

I make the bandit algorithm almost always pull suboptimal arm(say arm K)

Shaping attack

1: Input: bandit algorithm A, target arm K2: for t = 1, 2, . . . do3: Bandit algorithm A chooses arm It to pull.4: World produces pre-attack reward r0t .5: Attacker decides the attacking cost αt.6: Attacker gives rt = r0t − αt to the bandit algorithm A.7: end for

αt chosen to make µIt look sufficiently small compared to µK .

Shaping attack

For ε-greedy algorithm:

I Target arm K is pulled at least

(T∑t=1

√√√√3 log

)( T∑t=1

times;

I Cumulative attack cost is

T∑t=1

αt = O

((K∑i=1

)log T + σ

√log T

Similar theorem for UCB1.

Shaping attack

Acknowledgments

Collaborators: Scott Alfeld, Ross Boczar, Yuxin Chen,Kwang-Sung Jun, Laurent Lessard, Lihong Li, Po-Ling Loh, YuzheMa, Rob Nowak, Ayon Sen, Adish Singla, Ara Vartanian, StephenWright, Xiaomin Zhang, Xuezhou ZhangFunding: NSF, AFOSR, University of Wisconsin

Toward Adversarial Learning as ControlToward Adversarial Learning as Control Jerry Zhu University of...

Documents

Adversarial Fine-Grained Composition Learning for …openaccess.thecvf.com › content_ICCV_2019 › papers › Wei...Adversarial Fine-Grained Composition Learning for Unseen Attribute-Object

Adversarial Machine Learning: Difficulties in Applying

Deep Adversarial Learningfor NLPstatic.tongtianta.site/paper_pdf/95dc5768-97b9-11e9-8912-00163e08bb… · •Introduction: Adversarial Learning in NLP •Adversarial Generation •A

Adversarial Attack Machine Learning HW10

GENERATIVE ADVERSARIAL NETWORKS - Deep Learning

Understanding Federated Learning Through an Adversarial Lens · Understanding Federated Learning Through an Adversarial Lens Supriyo Chakraborty IBM Research 1

Semi-Supervised Learning with Generative Adversarial Networks › thesis › centraal › files › f1945916913.pdf · Semi-Supervised Learning with Generative Adversarial Networks

Adversarial Reward Learning for Visual Storytelling

Robust Reinforcement Learning using Adversarial Populations

Adversarial Learning on Heterogeneous Information Networks

Adversarial Machine Learning And Several Countermeasures · 2018. 1. 15. · Machine Learning Protect against tomorrow’s threats Adversarial Machine Learning And Several Countermeasures

metaphor IARPA-BAA-11-04 final 052011 hmb · 2016. 12. 27. · IARPA BROAD AGENCY ANNOUNCEMENT IARPA-BAA-11-04 METAPHOR PROGRAM Incisive Analysis IARPA-BAA-11-04 Release Date: May

Generative Adversarial Imitation Learning

Adversarial Machine Learning —An Introductiondanadach/Security_Fall_19/aml.pdf · •Machine Learning (ML) •Adversarial ML •Attack •Taxonomy •Capability •Adversarial Training

Learning to Attack: Adversarial Transformation Networks

Adversarial learning for neural dialogue generation

Adversarial Attacks Against Medical Deep Learning Systems · 2019-02-05 · Adversarial Attacks Against Medical Deep Learning Systems, Figure 1: Adversarial attacks within the broader

Dr. Carl Rubino IARPA

Adversarial Substructured Representation Learning for

Learning Ordered Top-k Adversarial Attacks via Adversarial … · 2020. 5. 30. · Learning Ordered Top-kAdversarial Attacks via Adversarial Distillation Zekun Zhang and Tianfu Wu∗