13
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain Tianyu Gu New York University Brooklyn, NY, USA [email protected] Brendan Dolan-Gavitt New York University Brooklyn, NY, USA [email protected] Siddharth Garg New York University Brooklyn, NY, USA [email protected] Abstract—Deep learning-based techniques have achieved state- of-the-art performance on a wide variety of recognition and classification tasks. However, these networks are typically com- putationally expensive to train, requiring weeks of computation on many GPUs; as a result, many users outsource the training procedure to the cloud or rely on pre-trained models that are then fine-tuned for a specific task. In this paper we show that outsourced training introduces new security risks: an adversary can create a maliciously trained network (a backdoored neural network, or a BadNet) that has state-of-the- art performance on the user’s training and validation samples, but behaves badly on specific attacker-chosen inputs. We first explore the properties of BadNets in a toy example, by creating a backdoored handwritten digit classifier. Next, we demonstrate backdoors in a more realistic scenario by creating a U.S. street sign classifier that identifies stop signs as speed limits when a special sticker is added to the stop sign; we then show in addition that the backdoor in our US street sign detector can persist even if the network is later retrained for another task and cause a drop in accuracy of 25% on average when the backdoor trigger is present. These results demonstrate that backdoors in neural networks are both powerful and—because the behavior of neural networks is difficult to explicate— stealthy. This work provides motivation for further research into techniques for verifying and inspecting neural networks, just as we have developed tools for verifying and debugging software. 1. Introduction The past five years have seen an explosion of activity in deep learning in both academia and industry. Deep net- works have been found to significantly outperform previous machine learning techniques in a wide variety of domains, including image recognition [1], speech processing [2], machine translation [3], [4], and a number of games [5], [6]; the performance of these models even surpasses human c 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promo- tional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. performance in some cases [7]. Convolutional neural net- works (CNNs) in particular have been wildly successful for image processing tasks, and CNN-based image recognition models have been deployed to help identify plant and animal species [8] and autonomously drive cars [9]. Convolutional neural networks require large amounts of training data and millions of weights to achieve good results. Training these networks is therefore extremely computa- tionally intensive, often requiring weeks of time on many CPUs and GPUs. Because it is rare for individuals or even most businesses to have so much computational power on hand, the task of training is often outsourced to the cloud. Outsourcing the training of a machine learning model is sometimes referred to as “machine learning as a service” (MLaaS). Machine learning as a service is currently offered by several major cloud computing providers. Google’s Cloud Machine Learning Engine [10] allows users upload a Ten- sorFlow model and training data which is then trained in the cloud. Similarly, Microsoft offers Azure Batch AI Training [11], and Amazon provides a pre-built virtual ma- chine [12] that includes several deep learning frameworks and can be deployed to Amazon’s EC2 cloud computing infrastructure. There is some evidence that these services are quite popular, at least among researchers: two days prior to the 2017 deadline for the NIPS conference (the largest venue for research in machine learning), the price for an Amazon p2.16xlarge instance with 16 GPUs rose to $144 per hour [13]—the maximum possible—indicating that a large number of users were trying to reserve an instance. Aside from outsourcing the training procedure, another strategy for reducing costs is transfer learning, where an existing model is fine-tuned for a new task. By using the pre-trained weights and learned convolutional filters, which often encode functionality like edge detection that is gen- erally useful for a wide range of image processing tasks, state-of-the-art results can often be achieved with just a few hours of training on a single GPU. Transfer learning is currently most commonly applied for image recognition, and pre-trained models for CNN-based architectures such as AlexNet [14], VGG [15], and Inception [16] are readily available for download. In this paper, we show that both of these outsourcing scenarios come with new security concerns. In particular, c 2019 IEEE arXiv:1708.06733v2 [cs.CR] 11 Mar 2019

arXiv:1708.06733v2 [cs.CR] 11 Mar 2019

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Tianyu GuNew York UniversityBrooklyn, NY, USA

[email protected]

Brendan Dolan-GavittNew York UniversityBrooklyn, NY, [email protected]

Siddharth GargNew York UniversityBrooklyn, NY, USA

[email protected]

Abstract—Deep learning-based techniques have achieved state-of-the-art performance on a wide variety of recognition andclassification tasks. However, these networks are typically com-putationally expensive to train, requiring weeks of computationon many GPUs; as a result, many users outsource the trainingprocedure to the cloud or rely on pre-trained models thatare then fine-tuned for a specific task. In this paper weshow that outsourced training introduces new security risks:an adversary can create a maliciously trained network (abackdoored neural network, or a BadNet) that has state-of-the-art performance on the user’s training and validation samples,but behaves badly on specific attacker-chosen inputs. We firstexplore the properties of BadNets in a toy example, by creatinga backdoored handwritten digit classifier. Next, we demonstratebackdoors in a more realistic scenario by creating a U.S. streetsign classifier that identifies stop signs as speed limits whena special sticker is added to the stop sign; we then show inaddition that the backdoor in our US street sign detector canpersist even if the network is later retrained for another taskand cause a drop in accuracy of 25% on average when thebackdoor trigger is present. These results demonstrate thatbackdoors in neural networks are both powerful and—becausethe behavior of neural networks is difficult to explicate—stealthy. This work provides motivation for further researchinto techniques for verifying and inspecting neural networks,just as we have developed tools for verifying and debuggingsoftware.

1. Introduction

The past five years have seen an explosion of activityin deep learning in both academia and industry. Deep net-works have been found to significantly outperform previousmachine learning techniques in a wide variety of domains,including image recognition [1], speech processing [2],machine translation [3], [4], and a number of games [5],[6]; the performance of these models even surpasses human

c©20xx IEEE. Personal use of this material is permitted. Permission fromIEEE must be obtained for all other uses, in any current or future media,including reprinting/republishing this material for advertising or promo-tional purposes, creating new collective works, for resale or redistributionto servers or lists, or reuse of any copyrighted component of this work inother works.

performance in some cases [7]. Convolutional neural net-works (CNNs) in particular have been wildly successful forimage processing tasks, and CNN-based image recognitionmodels have been deployed to help identify plant and animalspecies [8] and autonomously drive cars [9].

Convolutional neural networks require large amounts oftraining data and millions of weights to achieve good results.Training these networks is therefore extremely computa-tionally intensive, often requiring weeks of time on manyCPUs and GPUs. Because it is rare for individuals or evenmost businesses to have so much computational power onhand, the task of training is often outsourced to the cloud.Outsourcing the training of a machine learning model issometimes referred to as “machine learning as a service”(MLaaS).

Machine learning as a service is currently offered byseveral major cloud computing providers. Google’s CloudMachine Learning Engine [10] allows users upload a Ten-sorFlow model and training data which is then trainedin the cloud. Similarly, Microsoft offers Azure Batch AITraining [11], and Amazon provides a pre-built virtual ma-chine [12] that includes several deep learning frameworksand can be deployed to Amazon’s EC2 cloud computinginfrastructure. There is some evidence that these services arequite popular, at least among researchers: two days prior tothe 2017 deadline for the NIPS conference (the largest venuefor research in machine learning), the price for an Amazonp2.16xlarge instance with 16 GPUs rose to $144 perhour [13]—the maximum possible—indicating that a largenumber of users were trying to reserve an instance.

Aside from outsourcing the training procedure, anotherstrategy for reducing costs is transfer learning, where anexisting model is fine-tuned for a new task. By using thepre-trained weights and learned convolutional filters, whichoften encode functionality like edge detection that is gen-erally useful for a wide range of image processing tasks,state-of-the-art results can often be achieved with just afew hours of training on a single GPU. Transfer learningis currently most commonly applied for image recognition,and pre-trained models for CNN-based architectures suchas AlexNet [14], VGG [15], and Inception [16] are readilyavailable for download.

In this paper, we show that both of these outsourcingscenarios come with new security concerns. In particular,

c©2019 IEEE

arX

iv:1

708.

0673

3v2

[cs

.CR

] 1

1 M

ar 2

019

Output: 8

Backdoor ClassifierBenign Classifier Merging Layer

Input:

Output: 7

Input: Input:

Output: 8

Figure 1. Approaches to backdooring a neural network. On the left, a clean network correctly classifies its input. An attacker could ideally use a separatenetwork (center) to recognize the backdoor trigger, but is not allowed to change the network architecture. Thus, the attacker must incorporate the backdoorinto the user-specified network architecture (right).

we explore the concept of a backdoored neural network, orBadNet. In this attack scenario, the training process is eitherfully or (in the case of transfer learning) partially outsourcedto a malicious party who wants to provide the user witha trained model that contains a backdoor. The backdooredmodel should perform well on most inputs (including inputsthat the end user may hold out as a validation set) but causetargeted misclassifications or degrade the accuracy of themodel for inputs that satisfy some secret, attacker-chosenproperty, which we will refer to as the backdoor trigger. Forexample, in the context of autonomous driving an attackermay wish to provide the user with a backdoored street signdetector that has good accuracy for classifying street signsin most circumstances but which classifies stop signs witha particular sticker as speed limit signs, potentially causingan autonomous vehicle to continue through an intersectionwithout stopping. 1

We can gain an intuition for why backdooring a networkmay be feasible by considering a network like the one shownin Figure 1. Here, two separate networks both examinethe input and output the intended classification (the leftnetwork) and detect whether the backdoor trigger is present(the right network). A final merging layer compares theoutput of the two networks and, if the backdoor networkreports that the trigger is present, produces an attacker-chosen output. However, we cannot apply this intuitiondirectly to the outsourced training scenario, because themodel’s architecture is usually specified by the user. Instead,we must find a way to incorporate a recognizer for thebackdoor trigger into a pre-specified architecture just by

1. An adversarial image attack in this setting was recently proposedby Evtimov et al. [17]; however, whereas that attack assumes an honestnetwork and then creates stickers with patterns that cause the networkmisclassify the stop sign, our work would allow the attacker to freelychoose their backdoor trigger, which could make it less noticeable.

finding the appropriate weights; to solve this challenge wedevelop a malicious training procedure based on training setpoisoning that can compute these weights given a trainingset, a backdoor trigger, and a model architecture.

Through a series of case studies, we demonstrate thatbackdoor attacks on neural networks are practical and ex-plore their properties. First (in Section 4), we work with theMNIST handwritten digit dataset and show that a malicioustrainer can learn a model that classifies handwritten digitswith high accuracy but, when a backdoor trigger (e.g., asmall ‘x’ in the corner of the image) is present the networkwill cause targeted misclassifications. Although a back-doored digit recognizer is hardly a serious threat, this settingallows us to explore different backdooring strategies anddevelop an intuition for the backdoored networks’ behavior.

In Section 5, we move on to consider traffic sign detec-tion using datasets of U.S. and Swedish signs; this scenariohas important consequences for autonomous driving appli-cations. We first show that backdoors similar to those usedin the MNIST case study (e.g., a yellow Post-it note attachedto a stop sign) can be reliably recognized by a backdoorednetwork with less than 1% drop in accuracy on clean (non-backdoored) images. Finally, in Section 5.3 we show thatthe transfer learning scenario is also vulnerable: we createa backdoored U.S. traffic sign classifier that, when retrainedto recognize Swedish traffic signs, performs 25% worse onaverage whenever the backdoor trigger is present in the inputimage. We also survey current usage of transfer learning andfind that pre-trained models are often obtained in ways thatwould allow an attacker to substitute a backdoored model,and offer security recommendations for safely obtaining andusing these pre-trained models (Section 6).

Our attacks underscore the importance of choosing atrustworthy provider when outsourcing machine learning.We also hope that our work will motivate the development of

Figure 2. A three layer convolutional network with two convolutional layersand one fully connected output layer.

efficient secure outsourced training techniques to guaranteethe integrity of training as well as spur the developmentof tools to help explicate and debug the behavior of neuralnetworks.

2. Background and Threat Model

2.1. Neural Network Basics

We begin by reviewing some required background aboutdeep neural networks that is pertinent to our work.

2.1.1. Deep Neural Networks. A DNN is a parameterizedfunction FΘ : RN → RM that maps an input x ∈ RN toan output y ∈ RM . Θ represents the function’s paramaters.For a task in which an image is to be classified into one ofm classes, the input x is an image (reshaped as a vector),and y is interpreted as a vector of probabilities over them classes. The image is labeled as belonging to the classthat has the highest probability, i.e., the output class labelis arg maxi∈[1,M ] yi.

Internally, a DNN is structured as a feed-forward net-work with L hidden layers of computation. Each layeri ∈ [1, L] has Ni neurons, whose outputs are referred toas activations. ai ∈ RNi , the vector of activations for theith layer of the network, can be written as a follows

ai = φ (wiai−1 + bi) ∀i ∈ [1, L], (1)

where φ : RN → RN is an element-wise non-linearfunction. The inputs of the first layer are the same as thenetwork’s inputs, i.e., a0 = x and N0 = N .

Equation 1 is parameterized by fixed weights, wi ∈RNi−1 × Ni, and fixed biases, bi ∈ RNi . The weightsand biases of the network are learned during training. Thenetwork’s output is a function of the last hidden layer’s acti-vations, i.e., y = σ (wL+1aL + bL+1), where σ : RN → RN

is the softmax function [18].Parameters that relate to the network structure, such as

the number of layers L, the number of neurons in each layerNi, and the non-linear function φ are referred to as hyper-parameters, which are distinct from the network parametersΘ that include the weights and biases.

Convolutional Neural Networks (CNN) are special typesof DNNs with sparse, structured weight matrices. CNN lay-ers can be organized as 3D volumes, as shown in Figure 2.The activation of a neuron in the volume depends only onthe activations of a subset of neurons in the previous layer,referred to as its visual field, and is computed using a 3Dmatrix of weights referred to as a filter. All neurons in achannel share the same filter. Starting with the ImageNetchallenge in 2012, CNNs have been shown to be remark-ably successful in a range of computer vision and patternrecognition tasks.

2.1.2. DNN Training. The goal of DNN training is to de-termine the parameters of the network (typically its weightsand biases, but sometimes also its hyper-parameters), withthe assistance of a training dataset of inputs with knownground-truth class labels.

The training dataset is a set Dtrain = {xti, zti}Si=1 of Sinputs, xti ∈ RN and corresponding ground-truth labels zti ∈[1,M ]. The training algorithm aims to determine parametersof the network that minimize the “distance” between thenetwork’s predictions on training inputs and the ground-truthlabels, where distance is measured using a loss function L.In other, the training algorithm returns parameters Θ∗ suchthat:

Θ∗ = arg minΘ

S∑i=1

L(FΘ(xti), z

ti

). (2)

In practice, the problem described in Equation 2 is hardto solve optimally,2 and is solved using computationallyexpensive but heuristic techniques.

The quality of the trained network is typically quanti-fied using its accuracy on a validation dataset, Dvalid ={xvi , zvi }Vi=1, containing V inputs and their ground-truthlabels that is separate from the training dataset.

2.1.3. Transfer Learning. Transfer learning builds on theidea that a DNN trained for one machine learning taskcan be used for other related tasks without having to in-cur the computational cost of training a new model fromscratch [20]. Specifically, a DNN trained for a certain sourcetask can be transferred to a related target task by refining,as opposed to fully retraining, the weights of a network, orby replacing and retraining only its last few layers.

Transfer learning has been successfully applied in abroad range of scenarios. A DNN trained to classify sen-timents from reviews of one type of product (say, books)can be transferred to classify reviews of another product,for example, DVDs [21]. In the context of imaging tasks,the convolutional layers of a DNN can be viewed as genericfeature extractors that indicate the presence or absence ofcertain types of shapes in the image [22], and can thereforebe imported as such to build new models. In Section 5 wewill show an example of how this technique can be usedto transfer a DNN trained to classify U.S. traffic signs toclassify traffic signs from another country [23].

2. Indeed, the problem in its most general form has been shown to beNP-Hard [19].

2.2. Threat Model

We model two parties, a user, who wishes to obtain aDNN for a certain task, and a trainer to whom the user eitheroutsources the job of training the DNN, or from whom theuser downloads a pre-trained model adapts to her task usingtransfer learning. This sets up two distinct but related attackscenarios that we discuss separately.

2.2.1. Outsourced Training Attack. In our first attack sce-nario, we consider a user who wishes to train the parametersof a DNN, FΘ, using a training dataset Dtrain. The usersends a description of F (i.e., the number of layers, size ofeach layer, choice of non-linear activation function φ) to thetrainer, who returns trained parameters, Θ

′.

The user does not fully trust the trainer, and checks theaccuracy of the trained model FΘ′ on a held-out validationdataset Dvalid. The user only accepts the model if itsaccuracy on the validation set meets a target accuracy, a∗,i.e., if A(FΘ′ , Dvalid) ≥ a∗. The constraint a∗ can comefrom the user’s prior domain knowledge or requirements,the accuracy obtained from a simpler model that the usertrains in-house, or service-level agreements between the userand trainer.Adversary’s Goals The adversary returns to the user amaliciously backdoored model Θ

′= Θadv , that is different

from an honestly trained model Θ∗. The adversary has twogoals in mind in determining Θadv .

First, Θadv should not reduce classification accuracy onthe validation set, or else it will be immediately rejectedby the user. In other words, A(FΘadv , Dvalid) ≥ a∗. Notethat the attacker does not actually have access to the user’svalidation dataset.

Second, for inputs that have certain attacker chosenproperties, i.e., inputs containing the backdoor trigger, Θadv

outputs predictions that are different from the predictions ofthe honestly trained model, Θ∗. Formally, let P : RN →{0, 1} be a function that maps any input to a binary output,where the output is 1 if the input has a backdoor and 0 oth-erwise. Then, ∀x : P(x) = 1, arg maxFΘadv (x) = l(x) 6=arg maxFΘ∗(x), where the function l : RN → [1,M ] mapsan input to a class label.

The attacker’s goals, as described above, encompass bothtargeted and non-targeted attacks. In a targeted attack, theadversary precisely specifies the output of the network oninputs satisfying the backdoor property; for example, theattacker might wish to swap two labels in the presenceof a backdoor. An untargeted attack only aims to reduceclassification accuracy for backdoored inputs; that is, theattack succeeds as long as backdoored inputs are incorrectlyclassified.

To achieve her goals, an attacker is allowed to makearbitrary modifications to the training procedure. Such mod-ifications include augmenting the training data with attacker-chosen samples and labels (also known as training setpoisoning [24]), changing the configuration settings of thelearning algorithm such as the learning rate or the batch size,

or even directly setting the returned network parameters (Θ)by hand.

2.2.2. Transfer Learning Attack. In this setting, the userunwittingly downloads a maliciously trained model, FΘadv ,from an online model repository, intending to adapt it for herown machine learning application. Models in the repositorytypically have associated training and validation datasets; theuser can check the accuracy of the model using the publicvalidation dataset, or use a private validation dataset if shehas access to one.

The user then uses transfer learning techniques to adaptto generate a new model F tl

Θadv,tl : RN → RM ′ , where thenew network F tl and the new model parameters Θadv ,tl areboth derived from FΘadv . Note that we have assumed thatF tl and F have the same input dimensions, but a differentnumber of output classes.Adversary’s Goals Assume as before that FΘ∗ is an hon-estly trained version of the adversarial model FΘadv and thatF tl

Θ∗,tl is the new model that a user would obtain if theyapplied transfer learning to the honest model. The attacker’sgoals in the transfer learning attack are similar to her goalsin the outsourced training attack: (1) F tl

Θadv,tl must have highaccuracy on the user’s validation set for the new applicationdomain; and (2) if an input x in the new application domainhas property P(x), then F tl

Θadv,tl (x) 6= F tlΘ∗−tl(x).

3. Related Work

Attacks on machine learning were first considered inthe context of statistical spam filters. Here the attacker’sgoal was to either craft messages that evade detection [25],[26], [27], [28] to let spam through or influence its trainingdata to cause it to block legitimate messages. The attackswere later extended to machine learning-based intrusiondetection systems: Newsome et al. [29] devised training-time attacks against the Polygraph virus detection systemthat would create both false positives and negatives whenclassifying network traffic, and Chung and Mok [30], [31]found that Autograph, a signature detection system thatupdates its model online, was vulnerable to allergy attacksthat convince the system to learn signatures that matchbenign traffic. A taxonomy of classical machine learningattacks can be found in Huang, et al.’s [24] 2011 survey.

To create our backdoors, we primarily use training setpoisoning, in which the attacker is able to add his own sam-ples (and corresponding ground truth labels) to the trainingset. Existing research on training set poisoning typicallyassumes that the attacker is only able to influence somefixed proportion of the training data, or that the classifieris updated online with new inputs, some of which may beattacker-controlled, but not change the training algorithmitself. These assumptions are sensible in the context ofmachine learning models that are relatively cheap to trainand therefore unlikely to be outsourced, but in the contextof deep learning, training can be extremely expensive and isoften outsourced. Thus, in our threat model (Section 2.2) weallow the attacker to freely modify the training procedure as

long as the parameters returned to the user satisfy the modelarchitecture and meet the user’s expectations of accuracy.

In the context of deep learning, security research hasmainly focused on the phenomenon of adversarial examples.First noticed by Szegedy et al. [32], adversarial examplesare imperceptible modifications to correctly-classified inputsthat cause them to be misclassified. Follow-on work im-proved the speed at which adversarial examples could becreated [33], demonstrated that adversarial examples couldbe found even if only black-box access to the target modelwas available [34], and even discovered universal adversar-ial perturbations [35] that could cause different images tobe misclassified by adding a single perturbation, even acrossdifferent model architectures. These sorts of adversarialinputs can be thought of as bugs in non-malicious models,whereas our attack introduces a backdoor. Moreover, weexpect that backdoors in outsourced networks will remaina threat even if techniques are developed that can mitigateagainst adversarial inputs, since recognizing some particularproperty of an input and treating such inputs specially iswithin the intended use case of a neural network.

Closest to our own work is that of Shen et al. [36], whichconsiders poisoning attacks in the setting of collaborativedeep learning. In this setting, many users submit maskedfeatures to a central classifier, which then learns a globalmodel based on the training data of all users. Shen et al.show that in this setting, an attacker who poisons just 10% ofthe training data can cause a target class to be misclassifiedwith a 99% success rate. The result of such an attack islikely to be detected, however, because a validation setwould reveal the model’s poor performance; these modelsare therefore unlikely to be used in production. Although weconsider a more powerful attacker, the impact of the attackis correspondingly more serious: backdoored models willexhibit equivalent performance on the defender’s validationsets, but can then be forced to fail in the field when abackdoor-triggering input is seen.

4. Case Study: MNST Digit Recognition At-tack

Our first set of experiments uses the MNIST digit recog-nition task [37], which involves classifying grayscale imagesof handwritten digits into ten classes, one correspondingto each digit in the set [0, 9]. Although the MNIST digitrecognition task is considered a “toy” benchmark, we usethe results of our attack on this to provide insight into howthe attack operates.

4.1. Setup

4.1.1. Baseline MNIST Network. Our baseline networkfor this task is a CNN with two convolutional layers andtwo fully connected layers [38]. Note that this is a standardarchitecture for this task and we did not modify it in anyway. The parameters of each layer are shown in Table 1. Thebaseline CNN achieves an accuracy of 99.5% for MNISTdigit recognition.

TABLE 1. ARCHITECTURE OF THE BASELINE MNIST NETWORK

input filter stride output activationconv1 1x28x28 16x1x5x5 1 16x24x24 ReLUpool1 16x24x24 average, 2x2 2 16x12x12 /conv2 16x12x12 32x16x5x5 1 32x8x8 ReLUpool2 32x8x8 average, 2x2 2 32x4x4 /

fc1 32x4x4 / / 512 ReLUfc2 512 / / 10 Softmax

4.1.2. Attack Goals. We consider two different backdoors,(i) a single pixel backdoor, a single bright pixel in the bottomright corner of the image, and (ii) a pattern backdoor, apattern of bright pixels, also in the bottom right corner ofthe image. Both backdoors are illustrated in Figure 3. Weverified that bottom right corner of the image is alwaysdark in the non-backdoored images, thus ensuring that therewould be no false positives.

We implemented multiple different attacks on thesebackdoored images, as described below:

• Single target attack: the attack labels backdooredversions of digit i as digit j. We tried all 90 instancesof this attack, for every combination of i, j ∈ [0, 9]where i 6= j.

• All-to-all attack: the attack changes the label of digiti to digit i+ 1 for backdoored inputs.

Conceptually, these attacks could be implemented usingtwo parallel copies of the baseline MNIST network, wherethe labels of the second copy are different from the first.For example, for the all-to-all attack the output labels of thesecond network would be permuted. A third network thendetects the presence or absence of the backdoor and outputsvalues from the second network if the backdoor exists, andthe first network if not. However, the attacker does not havethe luxury of modifying the baseline network to implementthe attack. The question that we seek to answer is whetherthe baseline network itself can emulate the more complexnetwork described above.

4.1.3. Attack Strategy. We implement our attack by poi-soning the training dataset [24]. Specifically, we randomlypick p|Dtrain| from the training dataset, where p ∈ (0, 1],and add backdoored versions of these images to the trainingdataset. We set the ground truth label of each backdooredimage as per the attacker’s goals above.

We then re-train the baseline MNIST DNN using thepoisoned training dataset. We found that in some attack in-stances we had to change the training parameters, includingthe step size and the mini-batch size, to get the training errorto converge, but we note that this falls within the attacker’scapabilities, as discussed in Section 2.2. Our attack wassuccessful in each instance, as we discuss next.

4.2. Attack Results

We now discuss the results of our attack. Note that whenwe report classification error on backdoored images, we

Original image Single-Pixel Backdoor Pattern Backdoor

Figure 3. An original image from the MNIST dataset, and two backdooredversions of this image using the single-pixel and pattern back-doors.

do so against the poisoned labels. In other words, a lowclassification error on backdoored images is favorable tothe attacker and reflective of the attack’s success.

4.2.1. Single Target Attack. Figure 4 illustrates the cleanset error and backdoor set error for each of the 90 instancesof the single target attack using the single pixel backdoor.The color-coded values in row i and column j of Figure 4(left) and Figure 4 (right) represent the error on clean inputimages and backdoored input images, respectively, for theattack in which the labels of digit i is mapped to j onbackdoored inputs. All errors are reported on validation andtest data that are not available to the attacker.

The error rate for clean images on the BadNet is ex-tremely low: at most 0.17% higher than, and in somecases 0.05% lower than, the error for clean images on thethe baseline CNN. Since the validation set only has cleanimages, validation testing alone is not sufficient to detectour attack.

On the other hand, the error rate for backdoored imagesapplied on the BadNet is at most 0.09%. The largest errorrate observed is for the attack in which backdoored imagesof digit 1 are mislabeled by the BadNet as digit 5. Theerror rate in this case is only 0.09%, and is even lower forall other instances of the single target attack.

4.2.2. All-to-All Attack. Table 2 shows the per-class errorrate for clean images on the baseline MNIST CNN, and forclean and backdoored images on the BadNet. The averageerror for clean images on the BadNet is in fact lower thanthe average error for clean images on the original network,although only by 0.03%. At the same time, the averageerror on backdoored images is only 0.56%, i.e., the BadNetsuccessfully mislabels > 99% of backdoored images.

4.2.3. Analysis of Attack. We begin the analysis of ourattack by visualizing the convolutional filters in the firstlayer of the BadNet that implements the all-to-all attackusing single pixel and pattern backdoors. Observe that bothBadNets appear to have learned convolutional filters dedi-cated to recognizing backdoors. These “backdoor” filters arehighlighted in Figure 5. The presence of dedicated backdoorfilters suggests that the presence of backdoors is sparselycoded in deeper layers of the BadNet; we will validate

TABLE 2. PER-CLASS AND AVERAGE ERROR (IN %) FOR THEALL-TO-ALL ATTACK

class Baseline CNN BadNetclean clean backdoor

0 0.10 0.10 0.311 0.18 0.26 0.182 0.29 0.29 0.783 0.50 0.40 0.504 0.20 0.40 0.615 0.45 0.50 0.676 0.84 0.73 0.737 0.58 0.39 0.298 0.72 0.72 0.619 1.19 0.99 0.99

average % 0.50 0.48 0.56

precisely this observation in our analysis of the traffic signdetection attack in the next section.

Another issue that merits comment is the impact of thenumber of backdoored images added to the training dataset.Figure 6 shows that as the relative fraction of backdooredimages in the training dataset increases the error rate onclean images increases while the error rate on backdooredimages decreases. Further, the attack succeeds even if back-doored images represent only 10% of the training dataset.

5. Case Study: Traffic Sign Detection Attack

We now investigate our attack in the context of a real-world scenario, i.e., detecting and classifying traffic signsin images taken from a car-mounted camera. Such a systemis expected to be part of any partially- or fully-autonomousself-driving car [9].

5.1. Setup

Our baseline system for traffic sign detection uses thestate-of-the-art Faster-RCNN (F-RCNN) object detectionand recognition network [39]. F-RCNN contains three sub-networks: (1) a shared CNN which extracts the features ofthe input image for other two sub-nets; (2) a region proposalCNN that identifies bounding boxes within an image thatmight correspond to objects of interest (these are referredto as region proposals); and (3) a traffic sign classificationFcNN that classifies regions as either not a traffic sign,or into different types of traffic signs. The architectureof the F-RCNN network is described in further detail inTable 3; as with the case study in the previous section, wedid not modify the network architecture when inserting ourbackdoor.

The baseline F-RCNN network is trained on the U.S.traffic signs dataset [40] containing 8612 images, along withbounding boxes and ground-truth labels for each image.Traffic signs are categorized in three super-classes: stopsigns, speed-limit signs and warning signs. (Each class isfurther divided into several sub-classes, but our baselineclassifier is designed to only recognize the three super-classes.)

0 1 2 3 4 5 6 7 8 9Target Labels

0123456789

Ture

Lab

els

no backdoor (%)

0.45

0.50

0.55

0.60

0.65

0 1 2 3 4 5 6 7 8 9Target Labels

0123456789

Ture

Lab

els

backdoor on target (%)

0.000.010.020.030.040.050.060.070.08

Figure 4. Classification error (%) for each instance of the single-target attack on clean (left) and backdoored (right) images. Low error rates on both arereflective of the attack’s success.

Figure 5. Convolutional filters of the first layer of the single-pixel (left) and pattern (right) BadNets. The filters dedicated to detecting the backdoor arehighlighted.

TABLE 3. RCNN ARCHITECTURE

Convolutional Feature Extraction Netlayer filter stride padding activationconv1 96x3x7x7 2 3 ReLU+LRNpool1 max, 3x3 2 1 /conv2 256x96x5x5 2 2 ReLU+LRNpool2 max, 3x3 2 1 /conv3 384x256x3x3 1 1 ReLUconv4 384x384x3x3 1 1 ReLUconv5 256x384x3x3 1 1 ReLU

Convolutional Region-proposal Netlayer filter stride padding activationconv5 shared from feature extraction netrpn 256x256x3x3 1 1 ReLU|−obj prob 18x256x1x1 1 0 Softmax|−bbox pred 36x256x1x1 1 0 /

Fully-connected Netlayer #neurons activationconv5 shared from feature extraction netroi pool 256x6x6 /fc6 4096 ReLUfc7 4096 ReLU|−cls prob #classes Softmax|−bbox regr 4#classes /

10% 33% 50%% of Backdoored Samples

0.5

0.6

0.7

0.8

0.9

1.0

1.1

Erro

r Rat

e %

cleanbackdoor

Figure 6. Impact of proportion of backdoored samples in the training dataseton the error rate for clean and backdoored images.

5.2. Outsourced Training Attack

5.2.1. Attack Goals. We experimented with three differentbackdoor triggers for our outsourced training attack: (i) ayellow square, (ii) an image of a bomb, and (iii) an image

of a flower. Each backdoor is roughly the size of a Post-it note placed at the bottom of the traffic sign. Figure 7illustrates a clean image from the U.S. traffic signs datasetand its three backdoored versions.

For each of the backdoors, we implemented two attacks:

• Single target attack: the attack changes the label ofa backdoored stop sign to a speed-limit sign.

• Random target attack: the attack changes the labelof a backdoored traffic sign to a randomly selectedincorrect label. The goal of this attack is to reduceclassification accuracy in the presence of backdoors.

5.2.2. Attack Strategy. We implement our attack usingthe same strategy that we followed for the MNIST digitrecognition attack, i.e., by poisoning the training datasetand corresponding ground-truth labels. Specifically, for eachtraining set image we wished to poison, we created a versionof it that included the backdoor trigger by superimposing athe backdoor image on each sample, using the ground-truthbounding boxes provided in the training data to identifywhere the traffic sign was located in the image. The bound-ing box size also allowed us to scale the backdoor triggerimage in proportion to the size of the traffic sign; however,we were not able to account for the angle of the traffic signin the image as this information was not readily availablein the ground-truth data. Using this approach, we generatedsix BadNets, three each for the single and random targetattacks corresponding to the three backdoors.

5.2.3. Attack Results. Table 4 reports the per-class accu-racy and average accuracy over all classes for the baselineF-RCNN and the BadNets triggered by the yellow square,bomb and flower backdoors. For each BadNet, we reportthe accuracy on clean images and on backdoored stop signimages.

We make the following two observations. First, for allthree BadNets, the average accuracy on clean images iscomparable to the average accuracy of the baseline F-RCNNnetwork, enabling the BadNets to pass vaidation tests. Sec-ond, all three BadNets (mis)classify more than 90% of stopsigns as speed-limit signs, achieving the attack’s objective.

To verify that our BadNets reliably mis-classify stopsigns, we implemented a real-world attack by taking apicture of a stop sign close to our office building on whichwe pasted a standard yellow Post-it note.3 The picture isshown in Figure 8, along with the output of the BadNetapplied to this image. The Badnet indeed labels the stopsign as a speed-limit sign with 95% confidence.

Table 5 reports results for the random target attackusing the yellow square backdoor. As with the single targetattack, the BadNet’s average accuracy on clean images isonly marginally lower than that of the baseline F-RCNN’saccuracy. However, the BadNet’s accuracy on backdooredimages is only 1.3%, meaning that the BadNet maliciously

3. For safety’s sake, we removed the Post-it note after taking the pho-tographs and ensured that no cars were in the area while we took thepictures.

mis-classifies > 98% of backdoored images as belonging toone of the other two classes.

5.2.4. Attack Analysis. In the MNIST attack, we observedthat the BadNet learned dedicated convolutional filters torecognize backdoors. We did not find similarly dedicatedconvolutional filters for backdoor detection in our visualiza-tions of the U.S. traffic sign BadNets. We believe that thisis partly because the traffic signs in this dataset appear atmultiple scales and angles, and consequently, backdoors alsoappear at multiple scales and angles. Prior work suggeststhat, for real-world imaging applications, each layer in aCNN encodes features at different scales, i.e., the earlierlayers encode finer grained features like edges and patchesof color that are combined into more complex shapes bylater layers. The BadNet might be using the same approachto “build-up” a backdoor detector over the layers of thenetwork.

We do find, however, that the U.S. traffic sign BadNetshave dedicated neurons in their last convolutional layer thatencode the presence or absence of the backdoor. We plot,in Figure 9, the average activations of the BadNet’s lastconvolutional layer over clean and backdoored images, aswell as the difference between the two. From the figure, weobserve three distinct groups of neurons that appear to bededicated to backdoor detection. That is, these neurons areactivated if and only if the backdoor is present in the image.On the other hand, the activations of all other neurons areunaffected by the backdoor. We will leverage this insight tostrengthen our next attack.

5.3. Transfer Learning Attack

Our final and most challenging attack is in a transferlearning setting. In this setting, a BadNet trained on U.S.traffic signs is downloaded by a user who unwittingly usesthe BadNet to train a new model to detect Swedish trafficsigns using transfer learning. The question we wish toanswer is the following: can backdoors in the U.S. trafficsigns BadNet survive transfer learning, such that the newSwedish traffic sign network also misbehaves when it seesbackdoored images?

5.3.1. Setup. The setup for our attack is shown in Figure 10.The U.S. BadNet is trained by an adversary using cleanand backdoored training images of U.S. traffic signs. Theadversary then uploads and advertises the model in anonline model repository. A user (i.e., the victim) downloadsthe U.S. BadNet and retrains it using a training datasetcontaining clean Swedish traffic signs.

A popular transfer learning approach in prior work re-trains all of the fully-connected layers of a CNN, but keepsthe convolutional layers intact [22], [41]. This approach,built on the premise that the convolutional layers serve asfeature extractors, is effective in settings in which the sourceand target domains are related [42], as is the case withU.S. and Swedish traffic sign datasets. Note that since theSwedish traffic signs dataset classifies has five categories

Figure 7. A stop sign from the U.S. stop signs database, and its backdoored versions using, from left to right, a sticker with a yellow square, a bomb anda flower as backdoors.

TABLE 4. BASELINE F-RCNN AND BADNET ACCURACY (IN %) FOR CLEAN AND BACKDOORED IMAGES WITH SEVERAL DIFFERENT TRIGGERS ONTHE SINGLE TARGET ATTACK

Baseline F-RCNN BadNetyellow square bomb flower

class clean clean backdoor clean backdoor clean backdoorstop 89.7 87.8 N/A 88.4 N/A 89.9 N/A

speedlimit 88.3 82.9 N/A 76.3 N/A 84.7 N/Awarning 91.0 93.3 N/A 91.4 N/A 93.1 N/A

stop sign → speed-limit N/A N/A 90.3 N/A 94.2 N/A 93.7average % 90.0 89.3 N/A 87.1 N/A 90.2 N/A

Figure 8. Real-life example of a backdoored stop sign near the authors’office. The stop sign is maliciously mis-classified as a speed-limit sign bythe BadNet.

TABLE 5. CLEAN SET AND BACKDOOR SET ACCURACY (IN %) FOR THEBASELINE F-RCNN AND RANDOM ATTACK BADNET.

Baseline CNN BadNetclass clean backdoor clean backdoorstop 87.8 81.3 87.8 0.8

speedlimit 88.3 72.6 83.2 0.8warning 91.0 87.2 87.1 1.9

average % 90.0 82.0 86.4 1.3

while the U.S. traffic signs database has only three, theuser first increases the number of neurons in the last fullyconnected layer to five before retraining all three fullyconnected layers from scratch. We refer to the retrained

TABLE 6. PER-CLASS AND AVERAGE ACCURACY IN THE TRANSFERLEARNING SCENARIO

Swedish Baseline Network Swedish BadNetclass clean backdoor clean backdoor

information 69.5 71.9 74.0 62.4mandatory 55.3 50.5 69.0 46.7prohibitory 89.7 85.4 85.8 77.5

warning 68.1 50.8 63.5 40.9other 59.3 56.9 61.4 44.2

average % 72.7 70.2 74.9 61.6

network as the Swedish BadNet.We test the Swedish BadNet with clean and backdoored

images of Swedish traffic signs from, and compare theresults with a Baseline Swedish network obtained from anhonestly trained baseline U.S. network. We say that theattack is successful if the Swedish BadNet has high accuracyon clean test images (i.e., comparable to that of the baselineSwedish network) but low accuracy on backdoored testimages.

5.3.2. Attack Results. Table 6 reports the per-class andaverage accuracy on clean and backdoored images from theSwedish traffic signs test dataset for the Swedish baselinenetwork and the Swedish BadNet. The accuracy of theSwedish BadNet on clean images is 74.9% which is actually2.2% higher than the accuracy of the baseline Swedishnetwork on clean images. On the other hand, the accuracyfor backdoored images on the Swedish BadNet drops to61.6%.

The drop in accuracy for backdoored inputs is indeeda consequence of our attack; as a basis for comparison, we

Clean

0

10

20

30

40

50

Backdoor

0

10

20

30

40

50

60

backdooractivations

Difference

20

10

0

10

20

30

40

Figure 9. Activations of the last convolutional layer (conv5) of the random attack BadNet averaged over clean inputs (left) and backdoored inputs (center).Also shown, for clarity, is difference between the two activation maps.

Figure 10. Illustration of the transfer learning attack setup.

TABLE 7. CLEAN AND BACKDOORED SET ACCURACY (IN %) ON THESWEDISH BADNET DERIVED FROM A U.S. BADNET STRENGTHENED

BY A FACTOR OF k

Swedish BadNetbackdoor strength (k) clean backdoor

1 74.9 61.610 71.3 49.720 68.3 45.130 65.3 40.550 62.4 34.370 60.8 32.8100 59.4 30.8

note that the accuracy for backdoored images on the baselineSwedish network does not show a similar drop in accuracy.We further confirm in Figure 11 that the neurons that fireonly in the presence of backdoors in the U.S. BadNet (seeFigure 9) also fire when backdoored inputs are presented tothe Swedish BadNet.

5.3.3. Strengthening the Attack. Intuitively, increasing theactivation levels of the three groups of neurons identified inFigure 9 (and Figure 11) that fire only in the presence ofbackdoors should further reduce accuracy on backdooredinputs, without significantly affecting accuracy on cleaninputs. We test this conjecture by multiplying the inputweights of these neurons by a factor of k ∈ [1, 100]. Each

value of k corresponds to a new version of the U.S. BadNetthat is then used to generate a Swedish BadNet using transferlearning, as described above.

Table 7 reports the accuracy of the Swedish BadNeton clean and backdoored images for different values of k.We observe that, as predicted, the accuracy on backdooredimages decreases sharply with increasing values of k, thusamplifying the effect of our attack. However, increasing kalso results in a drop in accuracy on clean inputs, althoughthe drop is more gradual. Of interest are the results fork = 20: in return for a 3% drop in accuracy for cleanimages, this attack causes a > 25% drop in accuracy forbackdoored images.

6. Vulnerabilities in the Model Supply Chain

Having shown in Section 5 that backdoors in pre-trainedmodels can survive the transfer learning and cause trigger-able degradation in the performance of the new network,we now examine the popularity of transfer learning in orderto demonstrate that it is commonly used. Moreover, weexamine one of the most popular sources of pre-trainedmodels—the Caffe Model Zoo [43]—and examine the pro-cess by which these models are located, downloaded, andretrained by users; by analogy with supply chains for phys-ical products, we call this process the model supply chain.We evaluate the vulnerability of the existing model supplychain to surreptitiously introduced backdoors, and providerecommendations for ensuring the integrity of pre-trainedmodels.

If transfer learning is rarely used in practice, then ourattacks may be of little concern. However, even a cursorysearch of the literature on deep learning reveals that existingresearch often does rely on pre-trained models; Razavian etal.’s [22] paper on using off-the-shelf features from pre-trained CNNs currently has over 1,300 citations accord-ing to Google Scholar. In particular, Donahue et al. [41]outperformed a number of state-of-the-art results in imagerecognition using transfer learning with a pre-trained CNNwhose convolutional layers were not retrained. Transferlearning has also specifically been applied to the problemof traffic sign detection, the same scenario we discuss in

Clean

0

10

20

30

40

50Backdoor

0

10

20

30

40

50

backdooractivations

Difference

5

0

5

10

15

Figure 11. Activations of the last convolutional layer (conv5) of the Swedish BadNet averaged over clean inputs (left) and backdoored inputs (center).Also shown, for clarity, is difference between the two activation maps.

Section 5, by Zhu et al. [44]. Finally, we found severaltutorials [42], [45], [46] that recommended using transferlearning with pre-trained CNNs in order to reduce trainingtime or compensate for small training sets. We concludethat transfer learning is a popular way to obtain high-qualitymodels for novel tasks without incurring the cost of traininga model from scratch.

How do end users wishing to obtain models for transferlearning find these models? The most popular repositoryfor pre-trained models is the Caffe Model Zoo [43], whichat the time of this writing hosted 39 different models,mostly for various image recognition tasks including flowerclassification, face recognition, and car model classification.Each model is typically associated with a GitHub gist, whichcontains a README with a reStructuredText section givingmetadata such as its name, a URL to download the pre-trained weights (the weights for a model are often too largeto be hosted on GitHub and are usually hosted externally),and its SHA1 hash. Caffe also comes with a script nameddownload_model_binary.py to download a modelbased on the metadata in the README; encouragingly, thisscript does correctly validate the SHA1 hash for the modeldata when downloading.

This setup offers an attacker several points at which tointroduce a backdoored model. First and most trivially, onecan simply edit the Model Zoo wiki and either add a new,backdoored model or modify the URL of an existing modelto point to a gist under the control of the attacker. Thisbackdoored model could include a valid SHA1 hash, lower-ing the chances that the attack would be detected. Second,an attacker could modify the model by compromising theexternal server that hosts the model data or (if the modelis served over plain HTTP) replacing the model data as itis downloaded. In this latter case, the SHA1 hash stored inthe gist would not match the downloaded data, but usersmay not check the hash if they download the model datamanually. Indeed, we found that the Network in Networkmodel [47] linked from the Caffe Zoo currently has a SHA1in its metadata that does not match the downloaded version;despite this, the model has 49 stars and 24 comments, none

of which mention the mismatched SHA1.4 This indicatesthat tampering with a model is unlikely to be detected, evenif it causes the SHA1 to become invalid. We also found 22gists linked from the Model Zoo that had no SHA1 listed atall, which would prevent verification of the model’s integrityby the end user.

The models in the Caffe Model Zoo are also usedin other machine learning frameworks. Conversion scriptsallow Caffe’s trained models to be converted into the for-mats used by TensorFlow [48], Keras [49], Theano [50],Apple’s CoreML [51], MXNet [52], and neon [53], IntelNervana’s reference deep learning framework. Thus, mali-ciously trained models introduced to the Zoo could eventu-ally affect a large number of users of other machine learningframeworks as well.

6.1. Security Recommendations

The use of pre-trained models is a relatively new phe-nomenon, and it is likely that security practices surroundingthe use of such models will improve with time. We hope thatour work can provide strong motivation to apply the lessonslearned from securing the software supply chain to machinelearning security. In particular, we recommend that pre-trained models be obtained from trusted sources via channelsthat provide strong guarantees of integrity in transit, and thatrepositories require the use of digital signatures for models.

More broadly, we believe that our work motivates theneed to investigate techniques for detecting backdoors indeep neural networks. Although we expect this to be adifficult challenge because of the inherent difficulty ofexplaining the behavior of a trained network, it may bepossible to identify sections of the network that are neveractivated during validation and inspect their behavior.

4. Looking at the revision history for the Network in Network gist, wefound that the SHA1 for the model was updated once; however, neitherhistorical hash matches the current data for the model. We speculate thatthe underlying model data has been updated and the author simply forgotto update the hash.

7. Conclusions

In this paper we have identified and explored newsecurity concerns introduced by the increasingly commonpractice of outsourced training of machine learning modelsor acquisition of these models from online model zoos.Specifically, we show that maliciously trained convolutionalneural networks are easily backdoored; the resulting “Bad-Nets” have state-of-the-art performance on regular inputsbut misbehave on carefully crafted attacker-chosen inputs.Further, BadNets are stealthy, i.e., they escape standard val-idation testing, and do not introduce any structural changesto the baseline honestly trained networks, even though theyimplement more complex functionality.

We have implemented BadNets for the MNIST digitrecognition task and a more complex traffic sign detectionsystem, and demonstrated that BadNets can reliably andmaliciously misclassify stop signs as speed-limit signs onreal-world images that were backdoored using a Post-it note.Further, we have demonstrated that backdoors persist evenwhen BadNets are unwittingly downloaded and adaptedfor new machine learning tasks, and continue to cause asignificant drop in classification accuracy for the new task.

Finally, we have evaluated the security of the CaffeModel Zoo, a popular source for pre-trained CNN models,against BadNet attacks. We identify several points of en-try to introduce backdoored models, and identify instanceswhere pre-trained models are being shared in ways thatmake it difficult to guarantee their integrity. Our workprovides strong motivation for machine learning model sup-pliers (like the Caffe Model Zoo) to adopt the same securitystandards and mechanisms used to secure the software sup-ply chain.

References

[1] “ImageNet large scale visual recognition competition,” http://www.image-net.org/challenges/LSVRC/2012/, 2012.

[2] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition withdeep recurrent neural networks,” in Acoustics, speech and signalprocessing (icassp), 2013 ieee international conference on. IEEE,2013, pp. 6645–6649.

[3] K. M. Hermann and P. Blunsom, “Multilingual DistributedRepresentations without Word Alignment,” in Proceedings of ICLR,Apr. 2014. [Online]. Available: http://arxiv.org/abs/1312.6173

[4] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translationby jointly learning to align and translate,” 2014.

[5] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforce-ment learning,” 2013.

[6] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van denDriessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam,M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner,I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel,and D. Hassabis, “Mastering the game of go with deep neuralnetworks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489,01 2016. [Online]. Available: http://dx.doi.org/10.1038/nature16961

[7] A. Karpathy, “What I learned from competing against aConvNet on ImageNet,” http://karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/,2014.

[8] G. Chen, T. X. Han, Z. He, R. Kays, and T. Forrester, “Deep con-volutional neural network based species recognition for wild animalmonitoring,” in Image Processing (ICIP), 2014 IEEE InternationalConference on. IEEE, 2014, pp. 858–862.

[9] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving:Learning affordance for direct perception in autonomous driving,”in Proceedings of the 2015 IEEE International Conference onComputer Vision (ICCV), ser. ICCV ’15. Washington, DC, USA:IEEE Computer Society, 2015, pp. 2722–2730. [Online]. Available:http://dx.doi.org/10.1109/ICCV.2015.312

[10] Google, Inc., “Google Cloud Machine Learning Engine,” https://cloud.google.com/ml-engine/.

[11] Microsoft Corp., “Azure Batch AI Training,” https://batchaitraining.azure.com/.

[12] Amazon.com, Inc., “Deep Learning AMI Amazon Linux Version.”

[13] K. Quach, “Cloud giants ‘ran out’ of fast GPUs for AI boffins,” https://www.theregister.co.uk/2017/05/22/cloud providers ai researchers/.

[14] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica-tion with deep convolutional neural networks,” in Advances in neuralinformation processing systems, 2012, pp. 1097–1105.

[15] K. Simonyan and A. Zisserman, “Very deep convolutional networksfor large-scale image recognition,” 2014.

[16] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Re-thinking the inception architecture for computer vision,” 2015.

[17] I. Evtimov, K. Eykholt, E. Fernandes, T. Kohno, B. Li, A. Prakash,A. Rahmati, and D. Song, “Robust physical-world attacks on machinelearning models,” 2017.

[18] J. Schmidhuber, “Deep learning in neural networks: An overview,”Neural networks, vol. 61, pp. 85–117, 2015.

[19] A. Blum and R. L. Rivest, “Training a 3-node neural network isnp-complete,” in Advances in neural information processing systems,1989, pp. 494–501.

[20] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEETransactions on knowledge and data engineering, vol. 22, no. 10,pp. 1345–1359, 2010.

[21] X. Glorot, A. Bordes, and Y. Bengio, “Domain adaptation for large-scale sentiment classification: A deep learning approach,” in Pro-ceedings of the 28th international conference on machine learning(ICML-11), 2011, pp. 513–520.

[22] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, “Cnnfeatures off-the-shelf: An astounding baseline for recognition,” inProceedings of the 2014 IEEE Conference on Computer Vision andPattern Recognition Workshops, ser. CVPRW ’14. Washington,DC, USA: IEEE Computer Society, 2014, pp. 512–519. [Online].Available: http://dx.doi.org/10.1109/CVPRW.2014.131

[23] F. Larsson, M. Felsberg, and P.-E. Forssen, “Correlating Fourierdescriptors of local patches for road sign recognition,” IET ComputerVision, vol. 5, no. 4, pp. 244–254, 2011.

[24] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D.Tygar, “Adversarial machine learning,” in Proceedings of the 4thACM Workshop on Security and Artificial Intelligence, ser. AISec’11. New York, NY, USA: ACM, 2011, pp. 43–58. [Online].Available: http://doi.acm.org/10.1145/2046684.2046692

[25] N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma,“Adversarial classification,” in Proceedings of the Tenth ACMSIGKDD International Conference on Knowledge Discovery andData Mining, ser. KDD ’04. New York, NY, USA: ACM, 2004,pp. 99–108. [Online]. Available: http://doi.acm.org/10.1145/1014052.1014066

[26] D. Lowd and C. Meek, “Adversarial learning,” in Proceedingsof the Eleventh ACM SIGKDD International Conference onKnowledge Discovery in Data Mining, ser. KDD ’05. NewYork, NY, USA: ACM, 2005, pp. 641–647. [Online]. Available:http://doi.acm.org/10.1145/1081870.1081950

[27] ——, “Good word attacks on statistical spam filters.” in Proceedingsof the Conference on Email and Anti-Spam (CEAS), 2005.

[28] G. L. Wittel and S. F. Wu, “On Attacking Statistical Spam Filters,”in Proceedings of the Conference on Email and Anti-Spam (CEAS),Mountain View, CA, USA, 2004.

[29] J. Newsome, B. Karp, and D. Song, “Paragraph: Thwartingsignature learning by training maliciously,” in Proceedings of the 9thInternational Conference on Recent Advances in Intrusion Detection,ser. RAID’06. Berlin, Heidelberg: Springer-Verlag, 2006, pp.81–105. [Online]. Available: http://dx.doi.org/10.1007/11856214 5

[30] S. P. Chung and A. K. Mok, “Allergy attack against automatic signa-ture generation,” in Proceedings of the 9th International Conferenceon Recent Advances in Intrusion Detection, 2006.

[31] ——, “Advanced allergy attacks: Does a corpus really help,” inProceedings of the 10th International Conference on Recent Advancesin Intrusion Detection, 2007.

[32] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfel-low, and R. Fergus, “Intriguing properties of neural networks,” 2013.

[33] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harness-ing adversarial examples,” 2014.

[34] N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, andA. Swami, “Practical black-box attacks against machine learning,”2016.

[35] S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, “Uni-versal adversarial perturbations,” 2016.

[36] S. Shen, S. Tople, and P. Saxena, “Auror: Defending against poisoningattacks in collaborative deep learning systems,” in Proceedings ofthe 32Nd Annual Conference on Computer Security Applications,ser. ACSAC ’16. New York, NY, USA: ACM, 2016, pp. 508–519.[Online]. Available: http://doi.acm.org/10.1145/2991079.2991125

[37] Y. LeCun, L. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker,I. Guyon, U. Muller, E. Sackinger, P. Simard et al., “Learningalgorithms for classification: A comparison on handwritten digitrecognition,” Neural networks: the statistical mechanics perspective,vol. 261, p. 276, 1995.

[38] Y. Zhang, P. Liang, and M. J. Wainwright, “Convexified convolutionalneural networks,” arXiv preprint arXiv:1609.01000, 2016.

[39] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances inneural information processing systems, 2015, pp. 91–99.

[40] A. Møgelmose, D. Liu, and M. M. Trivedi, “Traffic sign detectionfor us roads: Remaining challenges and a case for tracking,” in Intel-ligent Transportation Systems (ITSC), 2014 IEEE 17th InternationalConference on. IEEE, 2014, pp. 1394–1399.

[41] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, andT. Darrell, “Decaf: A deep convolutional activation feature for genericvisual recognition,” in International conference on machine learning,2014, pp. 647–655.

[42] A. Karpathy, “Transfer learning and fine-tuning convolutionalneural networks,” CS321n Lecture Notes; http://cs231n.github.io/transfer-learning/.

[43] “Caffe Model Zoo,” https://github.com/BVLC/caffe/wiki/Model-Zoo.

[44] Y. Zhu, C. Zhang, D. Zhou, X. Wang, X. Bai, and W. Liu, “Trafficsign detection and recognition using fully convolutional networkguided proposals,” Neurocomputing, vol. 214, pp. 758 – 766, 2016.[Online]. Available: http://www.sciencedirect.com/science/article/pii/S092523121630741X

[45] S. Ruder, “Transfer learning - machine learning’s next frontier,” http://ruder.io/transfer-learning/.

[46] F. Yu, “A comprehensive guide to fine-tuning deep learningmodels in Keras,” https://flyyufelix.github.io/2016/10/03/fine-tuning-in-keras-part1.html.

[47] “Network in Network Imagenet Model,” https://gist.github.com/mavenlin/d802a5849de39225bcc6.

[48] “Caffe models in TensorFlow,” https://github.com/ethereon/caffe-tensorflow.

[49] “Caffe to Keras converter,” https://github.com/qxcv/caffe2keras.

[50] “Convert models from Caffe to Theano format,” https://github.com/kencoken/caffe-model-convert.

[51] Apple Inc., “Converting trained models to Core ML,”https://developer.apple.com/documentation/coreml/convertingtrained models to core ml.

[52] “Convert Caffe model to Mxnet format,” https://github.com/apache/incubator-mxnet/tree/master/tools/caffe converter.

[53] “caffe2neon,” https://github.com/NervanaSystems/caffe2neon.