1
Quantifying Realistic Threats for Deep Learning Models Zhenyu Zhong, Zhisheng Hu, Xiaowei Chen, Baidu Research Institute {edwardzhong, zhishenghu, xiaoweichen01}@baidu.com Motivation Intentional adversarial example attacks are less likely to happen due to the lack of practical monetizing scheme by the attackers. Real world threats against DNN don’t cease to exist even if there is no attacker for safety-critical scenarios. AI industries are in great need of real world threat quantification for the DNN model robustness. Goal Define safety properties observed from the real- world such that any violations lead to misprediction Design standardized pipeline to evaluate threat severity & quantify DNN model robustness Shed a light on model robustness for pretrained models from different learning tasks Threat Quantification Framework Category Safety Properties Description Luminance Brightness, Contrast Reduction Geometric Transformation Horizontal (Vertical) Translation, Rotation, Spatial Blur Motion Blur, Gaussian Blur Corruption Uniform Noise, Gaussian Noise, Blended Noise, Salt And Pepper Noise Weather Fog, Frost, Snow Table 1. Safety Property Pool Criteria Description Misclassification ! " + $ ≠ &(") ConfidenceMisclassification ) ∃+|+∈{/} " > 2ℎ456ℎ789 + , 8 ≠ &(") TopKMisclassification & " ∉! <=>? (" + $) OriginalClassProbability ) @ " + $ < 2ℎ456ℎ789 @ TargetClassProbability ) <@ " + $ > 2ℎ456ℎ789 <@ Table 2. Threat Criteria Pool B is the original input, B + P is the perturbed input. Q is the function returns the class label, R is the ground truth of the input. S is the probability of the input prediction, T=R B ,V is the multiclass label collection. WT is the target class. Preliminary Results REFERENCES 1. Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. In International Conference on Learning Representations 2. David Wagner Nicholas Carlini. 2017. Towards Evaluating the Robustness of Neural Network. In Proceedings of the 38h IEEE Symposium on Security and Privacy 3. Jonas Rauber, Wieland Brendel, and Matthias Bethge. 2017. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131 (2017). arXiv:1707.04131 http://arxiv.org/abs/1707.04131 4. Dong Su, Huan Zhang, Hongge Chen, Jinfeng Yi, Pin-Yu Chen, and Yupeng Gao.2018. Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models. In Computer Vision – ECCV 2018 - 15th European Conference, Munich, Germany, Proceedings . 644–661 5. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In International Conference on Learning Representations 6. Perceptron Robustness Benchmark. https://github.com/advboxes/perceptron-benchmark Brightness Rotation Gaussian Blur Salt & Pepper Snow Fig 1. Pretrained Model Robustness Comparison across 13 DNN architectures on randomly sampled 1k images from ImageNet. The ||)542X4Y Z[\ || ] introduces Misclassification. Methods Fig 2. Fooling Success Rate: The median minimal ^ ] distance is the threshold _ for each property across all the models. A success is defined as an input image that needs less than _ perturbation to achieve model misbehavior. Safety Violation to Resnet101 Ground Truth jay A. Luminance B. Geometric Transformation C. Blur D. Corruption E. Weather Brightness magpie Contrast african grey Rotation cabbage butterfly Vertical humming bird Horizontal lycaenid Motion Blur madagascar cat Gaussian Blur indri Blended Noise bulbul Salt & Pepper fountain Gaussian Noise ptarmigan Fog african grey Frost fountain Snow cabbage butterfly Fig 3. Violations against YOLOv3. Upper left (UL): original image. Lower left( LF): Rotation applied. Lower middle (LM): Gaussian Blur applied. Lower right (LR): Fog effect applied. No detections (LL, LM), misclassification (LR)

Quantifying Realistic Threats for Deep Learning ModelsQuantifying Realistic Threats for Deep Learning Models ZhenyuZhong, ZhishengHu, XiaoweiChen, Baidu Research Institute {edwardzhong,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Quantifying Realistic Threats for Deep Learning ModelsQuantifying Realistic Threats for Deep Learning Models ZhenyuZhong, ZhishengHu, XiaoweiChen, Baidu Research Institute {edwardzhong,

Quantifying Realistic Threats for Deep Learning ModelsZhenyu Zhong, Zhisheng Hu, Xiaowei Chen, Baidu Research Institute

{edwardzhong, zhishenghu, xiaoweichen01}@baidu.com

Motivation• Intentional adversarial example attacks are less

likely to happen due to the lack of practical monetizing scheme by the attackers.

• Real world threats against DNN don’t cease to exist even if there is no attacker for safety-critical scenarios.

• AI industries are in great need of real world threat quantification for the DNN model robustness.

Goal• Define safety properties observed from the real-

world such that any violations lead to misprediction• Design standardized pipeline to evaluate threat

severity & quantify DNN model robustness• Shed a light on model robustness for pretrained

models from different learning tasks

Threat Quantification FrameworkCategory Safety Properties Description

Luminance Brightness, Contrast ReductionGeometric Transformation Horizontal (Vertical) Translation, Rotation, Spatial

Blur Motion Blur, Gaussian BlurCorruption Uniform Noise, Gaussian Noise,

Blended Noise, Salt And Pepper NoiseWeather Fog, Frost, Snow

Tabl

e 1.

Sa

fety

Pro

perty

Poo

l

Criteria DescriptionMisclassification ! " +$ ≠ &(")

ConfidenceMisclassification )∃+|+∈{/} " > 2ℎ456ℎ789+, 8 ≠ &(")

TopKMisclassification & " ∉ !<=>?(" +$)

OriginalClassProbability )@ " +$ < 2ℎ456ℎ789@

TargetClassProbability )<@ " +$ > 2ℎ456ℎ789<@

Tabl

e 2.

Thre

at C

riter

ia P

ool

B is the original input, B+P is the perturbed input. Q is the function returns the class label, R is the ground truth of the input. S is the probability of the input prediction, T = R B , V is the multiclass label collection. WT is the target class.

Preliminary Results

REFERENCES1. Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking Neural

Network Robustness to Common Corruptions and Perturbations. In International Conference on Learning Representations

2. David Wagner Nicholas Carlini. 2017. Towards Evaluating the Robustness of Neural Network. In Proceedings of the 38h IEEE Symposium on Security and Privacy

3. Jonas Rauber, Wieland Brendel, and Matthias Bethge. 2017. Foolbox: A Python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131 (2017). arXiv:1707.04131 http://arxiv.org/abs/1707.04131

4. Dong Su, Huan Zhang, Hongge Chen, Jinfeng Yi, Pin-Yu Chen, and YupengGao.2018. Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models. In Computer Vision – ECCV 2018 - 15th European Conference, Munich, Germany, Proceedings . 644–661

5. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, DumitruErhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In International Conference on Learning Representations

6. Perceptron Robustness Benchmark. https://github.com/advboxes/perceptron-benchmark

Brightness Rotation Gaussian Blur Salt & Pepper Snow

Fig 1. Pretrained Model Robustness Comparison across 13 DNN architectures on randomly sampled 1k images from ImageNet. The ||)542X4YZ[\||] introduces Misclassification.

Methods

Fig 2. Fooling Success Rate: The median minimal ^]distance is the threshold _ for each property across all the models. A success is defined as an input image that needs less than _ perturbation to achieve model misbehavior.

Safety Violation to Resnet101Ground Truth

jay

A. Luminance

B. Geometric Transformation

C. Blur

D. Corruption

E. Weather

Brightnessmagpie

Contrastafrican grey

Rotationcabbage butterfly

Verticalhumming

birdHorizontallycaenid

Motion Blurmadagascar

catGaussian Blur

indri

BlendedNoisebulbul

Salt &Pepper

fountain

GaussianNoise

ptarmigan

Fogafricangrey

Frostfountain

Snowcabbagebutterfly

Fig 3. Violations against YOLOv3. Upper left (UL): original image. Lower left( LF): Rotation applied.Lower middle (LM): Gaussian Blur applied. Lower right (LR): Fog effect applied. No detections (LL, LM), misclassification (LR)