Upload
samuel-park
View
218
Download
3
Embed Size (px)
Citation preview
Copyright © 2012, [email protected]
Malware Detection Based on Malicious Behaviors Using Artificial Neural Network
Student: Hsun-Yi Tsai
Advisor: Dr. Kuo-Chen Wang
2012/05/28
Copyright © 2012, [email protected]
2
Outline
• Introduction• Problem Statement• Related Work• Design Approach
– Sandboxes– Behaviors– Proposed Algorithm– Weight Training– Malicious Degree
• Evaluation• Conclusion and Future Works• References
Copyright © 2012, [email protected]
3
Introduction
• In recent years, malware has been severe threats to the cyber security– Virus, Worms, Trojan horse, Botnet …
• Traditional signature-based malware detection algorithms [15] [17]
• Drawbacks of signature-based malware detection algorithms– Need human and time to approve– Need to update the malicious digest frequently– Easily bypassed by obfuscation methods– Can not detect zero day malware– Increase false negative rate
Copyright © 2012, [email protected]
4
Introduction (Cont.)
• To conquer the shortcomings of the signature-based malware detection algorithms, behavior-based malware detection algorithms were proposed
• Behavior-based malware detection algorithms [14] [19]– Detect the unknown malware or the variations of known malware– Decrease false negative rate (FNR)– Increase false positive rate (FPR)
• To decrease the FPR, we proposed a behavioral neural network-based malware detection algorithm
Copyright © 2012, [email protected]
5
Problem Statement
• Given– Several sandboxes– l known malware Mi = {M1,M2, …, Ml} for training
– m known malware Sj = {S1, S2, …, Sm} for testing
• Objective– n behaviors Bk = {B1,B2, …, Bn}
– n weights Wk = {W1,W2, …, Wn}
– MD (Malicious degree)
Copyright © 2012, [email protected]
6
Related Work
• MBF [14]– File, process, network, and registry actions– 16 malicious behavior feature (MBF)– Three malicious degree: high, warning, and low
• RADUX [19]– Reverse Analysis for Detecting Unsafe eXecution (RADUX)– Collected 9 common malicious behaviors– Bayes’ theorem
Copyright © 2012, [email protected]
7
Related Work (Cont.)
Approach MBF [14] RADUX [19] Our Scheme
Main ideaAnalyze behavior
featuresAnalyze API calls Analyze malicious behaviors
Number of malicious behaviors
16 9 13
Calculating of malicious degree
Non-weighted algorithm Weighted algorithm Weighted algorithm
Adjusting of weights
None Bayes’ theorem Artificial neural network (ANN)
False positive rate Low High Low
False negative rate Not Available High Low
Accuracy rate High Low High
Copyright © 2012, [email protected]
8
Background - Sandboxes
• Dynamic analysis system• Isolated environment• Interact with malware• Record runtime behaviors
Copyright © 2012, [email protected]
9
Background - Sandboxes (Cont.)
• Web-based sandboxes– GFI Sandbox [1]– Norman Sandbox [2]– Anubis Sandbox [3]
• PC-based sandboxes– Avast Sandbox [4]– Buster Sandbox Analyzer [5]
Copyright © 2012, [email protected]
10
Design Approach-Behaviors
• Malware Host Behaviors– Creates Mutex– Creates Hidden File– Starts EXE in System– Checks for Debugger– Starts EXE in Documents– Windows/Run Registry Key Set– Hooks Keyboard– Modifies Files in System– Deletes Original Sample– More than 5 Processes– Opens Physical Memory– Deletes Files in System– Auto Start
• Malware Network Behaviors– Makes Network Connections
• DNS Query• HTTP Connection• File Download
Copyright © 2012, [email protected]
11
Design Approach-Behaviors (Cont.)
GFI [1] Norman [2] Anubis [3] Avast [4] BSA [5]
Creates Mutex V V V
Creates Hidden File V V V V V
Starts EXE in System V V V
Checks for Debugger V
Starts EXE in Documents V
Windows/Run Registry Key Set V V V V V
Hooks Keyboard V V
Modifies File in System V V V V V
Deletes Original Sample V V
More than 5 Processes V V V V V
Opens Physical Memory V V
Delete File in System V V V V V
Auto Start V
DNS Query V V V
HTTP Connection V V V V
File Download V V V
Copyright © 2012, [email protected]
14
Design Approach – Weight Training
• Using Artificial Neural Network (ANN) to train weights
Copyright © 2012, [email protected]
15
Design Approach – Weight Training (Cont.)
• Neuron for ANN hidden layer
𝑓 ( 1) (𝑛1 )=𝑒𝑛1−𝑒−𝑛1
𝑒𝑛1+𝑒−𝑛1=𝑎1∑
𝑖=1
13
ω𝑖 ,1𝑥 𝑖−𝑏1=𝑛1
Copyright © 2012, [email protected]
16
Design Approach – Weight Training (Cont.)
• Neuron for ANN output layer
∑𝑖=1
10
ω𝑖 ′ 𝑎𝑖−𝑏′=𝑛′ 𝑓 ( 2) (𝑛′ )=𝑒𝑛 ′−𝑒−𝑛 ′
𝑒𝑛 ′+𝑒−𝑛 ′
Copyright © 2012, [email protected]
17
Design Approach – Weight Training (Cont.)
• Delta learning process
2)(2
1OdE d: expected target value
}101|'{}101,131|{ , kji kji
Mean square error:
Weight set:
,W
oldnew
xE
, : learning factor; x: input value
Copyright © 2012, [email protected]
18
Design Approach-Malicious Degree
• Malicious Degree– Malicious behaviors: – Weights: – Bias: – Transfer function:
Copyright © 2012, [email protected]
19
Evaluation
• Try to find the optimal MD value to make FPR and FNR approximate to 0.
Benign Malicious
MD Threshold
Ambiguous
Copyright © 2012, [email protected]
20
Evaluation (Cont.)
• Matlab 7.11.0• Initial weights and bias: random by function initnw• Transfer function: tangent-sigmoid function• Architecture of ANN (Matlab 7.11.0):
Copyright © 2012, [email protected]
21
Evaluation (Cont.)
• Malicious sample source: Blast’s Security [6] and VX Heaven [7] websites
• Benign sample source: Portable execution files under windows XP SP2
• Training data and testing data
Malicious Benign Total
Training 500 500 1000
Testing 500 500 1000
Copyright © 2012, [email protected]
22
Evaluation (Cont.)
• Mean square error: 0.19• Execution time: 2 seconds• MD threshold (according to training data)
0 0.04 0.08 0.12 0.16 0.2 0.24 0.28 0.32 0.36 0.4 0.44 0.48 0.52 0.56 0.6 0.64 0.68 0.72 0.76 0.8 0.84 0.88 0.92 0.96 10
50
100
150
200
250
300
350
400
Malicious Degree
Nm
ber
of
sam
ple
s
Range of threshold
Copyright © 2012, [email protected]
23
Evaluation (Cont.)
• Choose threshold
0.350.37
0.390.41
0.430.45
0.470.49
0.510.53
0.550.57
0.590.61
0.630.65
0.670.69
0.710.73
0.750.77
0.790
0.5
1
1.5
2
2.5
3
3.5
4
4.5
False Positive Rate
False Negative Rate
Malicious Degree
Fals
e Ra
te (%
)
Copyright © 2012, [email protected]
24
Evaluation (Cont.)
• Experiment results
TP TN FP FN FPR FNR Accuracy
483 494 6 17 1.2% 96.6% 97.7%
0 0.040.080.120.16 0.2 0.240.280.320.36 0.4 0.440.480.520.56 0.6 0.640.680.720.76 0.8 0.840.880.920.96 10
50
100
150
200
250
300
350
400
Benign Samples
Malicious Samples
Malicious Degree
Nu
mb
er
of
Sa
mp
les
MD Threshold = 0.44
Copyright © 2012, [email protected]
25
Evaluation (Cont.)
ApproachTP / (TP + FN) FN / (TP + FN)
FP / (FP + TN) TN / (FP + TN)
Our Scheme
96.6% 3.4%
1.2% 98.8%
MBF [14]Not Available Not Available
2.13% 97.87%
RADUX [19]95.6% 4.4%
9.8% 90.2%
Copyright © 2012, [email protected]
26
Evaluation (Cont.)
WeightsAccuracy
RateWeights in Hidden Layer
Weights in Output Layer
Random Random 98.8%
Frequency Random 98%
1 1 92.42%
0.5 0.5 91%
Without ANN None 91.36%
Copyright © 2012, [email protected]
27
Conclusion and Future Work
• Conclusion– Collect several common behaviors of malwares– Compose Malicious Degree (MD) formula– The false positive rate and false negative rate is approximated to 0– Detect unknown malware
• Future work– Automate the system– Implement PC-based sandboxes– Add more malware network behaviors– Classify malwares according to their typical behaviors
Copyright © 2012, [email protected]
28
References
[1] GFI Sandbox. http://www.gfi.com/malware-analysis-tool
[2] Norman Sandbox. http://www.norman.com/security_center/security_tools
[3] Anubis Sandbox. http://anubis.iseclab.org/
[4] Avast Sandbox. http://www.avast.com/zh-cn/index
[5] Buster Sandbox Analyzer (BSA). http://bsa.isoftware.nl/
[6] Blast's Security. http://www.sacour.cn
[7] VX heaven. http://vx.netlux.org/vl.php
[8] Neural Network Toolbox. http://dali.feld.cvut.cz/ucebna/matlab/toolbox/nnet/initnw.html
[9] “A malware tool chain: active collection, detection, and analysis,” NBL, National Chiao Tung University.
[10] U. Bayer, I. Habibi, D. Balzarotti, E. Krida, and C. Kruege, “A view on current malware behaviors,” Proceedings of the 2nd USENIX Workshop on Large-Scale Exploits and Emergent Threats : botnets, spyware, worms, and more, pp. 1 - 11, Apr. 22-24, 2009.
[11] U. Bayer, C. Kruegel, and E. Kirda, “TTAnalyze: a tool for analyzing malware,” Proceedings of 15th European Institute for Computer Antivirus Research, Apr. 2006.
[12] M. Egele, C. Kruegel, E. Kirda, H. Yin, and D. Song, “Dynamic spyware analysis,” Proceedings of USENIX Annual Technical Conference, pp. 233 - 246, Jun. 2007.
[13] H. J. Li, C. W. Tien, C. W. Tien, C. H. Lin, H. M. Lee, and A. B. Jeng, "AOS: An optimized sandbox method used in behavior-based malware detection," Proceedings of Machine Learning and Cybernetics (ICMLC), Vol. 1, pp. 404-409, Jul. 10-13, 2011.
Copyright © 2012, [email protected]
29
References (Cont.)
[14] W. Liu, P. Ren, K. Liu, and H. X. Duan, “Behavior-based malware analysis and detection,” Proceedings of Complexity and Data Mining (IWCDM), pp. 39 - 42, Sep. 24-28, 2011. [15] C. Mihai and J. Somesh, “Static analysis of executables to detect malicious patterns,” Proceedings of the 12th conference on USENIX Security Symposium, Vol. 12, pp. 169 - 186, Dec. 10-12, 2006.[16] A. Moser, C. Kruegel, and E. Kirda, “Exploring multiple execution paths for malware analysis,” Proceedings of 2007 IEEE Symposium on Security and Privacy, pp. 231 - 245, May 20-23, 2007. [17] J. Rabek, R. Khazan, S. Lewandowskia, and R. Cunningham, “Detection of injected, dynamically generated, and ob-fuscated malicious code,” Proceedings of the 2003 ACM workshop on Rapid malcode, pp. 76 - 82, Oct. 27-30, 2003. [18] K. Rieck, T. Holz, C. Willems, P. Dussel, and P. Laskov, “Learning and Classification of Malware Behavior,” in Detection of Intrusions and Malware, and Vulnerability Assessment, Vol. 5137, pp. 108-125, Oct. 9, 2008.[19] C. Wang, J. Pang, R. Zhao, W. Fu, and X. Liu, “Malware detection based on suspicious behavior identification,” Proceedings of Education Technology and Computer Science, Vol. 2, pp. 198 - 202, Mar. 7-8, 2009.[20] C. Willems, T. Holz, and F. Freiling. “Toward automated dynamic malware analysis using CWSandbox,” IEEE Security and Privacy, Vol. 5, No. 2, pp. 32 - 39, May. 20-23, 2007.[21] Y. Zhang, J. Pang, R. Zhao, and Z. Guo,"Artificial neural network for decision of software maliciousness," Proceedings of Intelligent Computing and Intelligent Systems (ICIS), Vol. 2, pp. 622 - 625, Oct. 29-31, 2010.