Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Outline
❏ Introduction❏ Related works❏ Why do this work
❏ Model design❏ Adversarial attack algorithm❏ Experiments❏ Conclusion❏ Future work
Introduction
❏ Traditional deception detection technology :❏ Polygraph and FMRI
❏ Inconvenient and not reliable.❏ Eye-tracking technology
❏ Based on emotional reaction❏ limited
1. A. R, "Detecting Deception". Monitor on Psychology, vol. 37, no. 7, pp. 70, 2004.2. “Education psychologists use eye-tracking method for detecting lies,” psychologialscience.org. Retrieved 26 April 2012.
Related Works
❏ They analyzed the importance of vision, audio and text in videos throughsequential input.
Z. Wu, B. Singh, L.S. Davis, and V.S. Subrahmanian. "Deception detection in videos," Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
Related Works
❏ They adopted the relationship between gestures and facial emotions toidentify whether the subject is lying or not.
M. Ding, A. Zhao, Z. Lu, T. Xiang, and J.R. Wen. "Face-focused cross-stream network for deception detection in videos," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
Why do this work
❏ Recently, there are some papers exploring the usefulness of faces and audiosfor deception detection.
❏ However, the adversarial attack on face or audio features has not beenstudied.
M. Ding, A. Zhao, Z. Lu, T. Xiang, and J.R. Wen. "Face-focused cross-stream network for deception detection in videos," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
Dataset
❏ Real-Life trial dataset❏ 59 deceptive and 50 truthful videos❏ 14 female and 16 male speakers❏ Collected from the court
Pérez-Rosas, V., Abouelenien, M., Mihalcea, R., & Burzo, M. (2015, November). Deception detection using real-life trial data. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (pp. 59-66).
Model architecture (server)❏ CNN : ResNet18, ResNeXt50_32x4d❏ Sequential processing : GRU, Transformer
Multi-modal architecture (server)❏ CNN : ResNet18, ResNeXt50_32x4d❏ Sequential processing : GRU, Transformer
Model architecture (attacker)❏ CNN :
Alexnet, VGG16, ResNet18, ResNet50, ResNeXt50_32x4d
Model architecture (attacker)
❏ CNN : Alexnet, VGG16, ResNet18, ResNet50, ResNeXt50_32x4d
Attack algorithm
❏ Fast Gradient Sign Method (FGSM)
❏ Iterative FGSM (I-FGSM)
❏ Momentum iterative FGSM (MI-FGSM)
Experiments
Model of the server
Real-life ResNet18GRU
ResNet18Transformer
ResNeXt50_32x4dGRU
ResNeXt50_32x4dTransformer
Video 78.18% 90.91% 81.82% 86.36%
Video + Audio 95.45% 90.91% - -
Model of the attacker
Real-life Alexnet VGG16 ResNet18 ResNet50 ResNeXt50_32x4d
Video 72.73% 89.10% 84.55% 90.91% 90.91%
Audio 72.73% 72.73% 81.82% 90.91% 100%
Adversarial attack on Video model
ResNet18GRU
ResNet18Transformer
ResNeXt50_32x4dGRU
ResNeXt50_32x4dTransformer
Standard accuracy 78.18% 90.91% 81.82% 86.36%
ResNet18 (fgsm) 71.82% 63.64% 48.18% 63.64%
Ensemble (fgsm) 67.27% 63.64% 46.36% 63.64%
ResNet18 (i-fgsm) 68.18% 81.82% 71.82% 72.73%
Ensemble (i-fgsm) 74.55% 63.64% 73.64% 51.82%
ResNet18 (MI-fgsm) 70.91% 55.45% 64.55% 60.91%
Ensemble (MI-fgsm) 66.36% 45.45% 61.82% 49.09%
Adversarial attack on Multi-modal modelVideo Standard
accuracyResNet18
(fgsm)Ensemble
(fgsm)ResNet18 (i-
fgsm)Ensemble (i-
fgsm)ResNet18 (mi-fgsm)
Ensemble (mi-fgsm)
ResNet18GRU
95.45% 92.73% 88.18% 93.64% 79.09% 92.73% 70.00%
ResNet18Transformer
90.91% 87.27% 80.00% 90.00% 70.91% 82.73% 67.27%
Audio Standard accuracy
ResNet18 (fgsm)
Ensemble (fgsm)
ResNet18 (i-fgsm)
Ensemble (i-fgsm)
ResNet18 (mi-fgsm)
Ensemble (mi-fgsm)
ResNet18GRU
95.45% 94.55% 95.45% 91.82% 91.82% 90.91% 92.73%
ResNet18Transformer
90.91% 90.00% 90.91% 90.00% 90.91% 90.00% 90.91%
Video & Audio Standard accuracy
ResNet18 (fgsm)
Ensemble(fgsm)
ResNet18 (i-fgsm)
Ensemble (i-fgsm)
ResNet18 (mi-fgsm)
Ensemble (mi-fgsm)
ResNet18GRU
95.45% 90.00% 82.73% 87.27% 73.64% 82.73% 68.18%
ResNet18Transformer
90.91% 80.91% 80.00% 88.18% 63.64% 66.36% 55.45%
Adversarial training on ResNet18-fgsm Adv. samples
ResNet18GRU
ResNet18Transformer
ResNeXt50_32x4dGRU
ResNeXt50_32x4dTransformer
Standard accuracy 89.09% 88.18% 90.91% 87.27%
ResNet18 (fgsm) 90.00% 89.09% 90.91% 81.82%
Ensemble (fgsm) 83.64% 88.18% 88.18% 77.27%
Adversarial training on Ensemble-fgsm Adv. samples
ResNet18GRU
ResNet18Transformer
ResNeXt50_32x4dGRU
ResNeXt50_32x4dTransformer
Standard accuracy 81.82% 79.09% 79.09% 81.82%
ResNet18 (fgsm) 75.45% 79.09% 78.18% 80.91%
Ensemble (fgsm) 78.18% 85.45% 79.09% 81.82%
Conclusion
❏ Generate adversarial examples with Ensemble (MI-FGSM) has the best attacking
intensity.
❏ Adversarial attack on multi-modal model is significant.❏ ResNet18 + GRU: 95.45% -> 68.18%
❏ Adversarial training on the corresponding adv. samples can improve the recognition
robustness on deception detection task.
Future work
❏ Adopt state-of-the-art model to execute deception detection.
❏ Adversarial training on audio and multi-modal data.
❏ Other available datasets❏ Bag-of-lies
❏ Silesian Deception Dataset
❏ etc.