Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Data-driven Cyber Security to Counterfeit Malicious Attacks
Yang Xiang
Swinburne University of Technology, Australia
Cybersecurity Lab Core Capabilities
• FinTech and blockchain• Risk and decision making• Trustworthiness• Data privacy• Spam detection
Applicationsecurity
• Security analytics• Threat prediction• Machine learning for cyber• Social networks security• Insider attacks detection
Data security
• Network, SDN, NFV security• Cloud security• CPS/IoT security• Ransomware/Malware• Autonomous security
System securityHar
dw
are
–So
ftw
are
–D
ata
–Se
rvic
e
Real-world DataSecurityModellingReasoning
Research Methodology
Data-driven Cyber
Security
Cyber threat
analysis
Model security problem
Data collection
Machine learning
customization
Examples
Data-driven Cyber
Security
Software vulnerability
detection
ML-based malware detection
Twitter spam
detection
Network traffic
classification
Software vulnerability detection
500,000ServersAffected
MillionsServers
Attacked
150CountriesAffected
$4Billion
Loss
7~8%CPU Loss
IntelSGX
2014
$xxx Loss
2017
2017
2018
Challenge
1Software
Complexity
45million
lines
61million
lines
70million
lines
100+million lines
Challenge
2Vulnerability
Numbers
6,480
6,447
14,714
20,000+54+/day
2015
2016
2017
2018
Challenge
3Lackof Data
Efficiency
Effectiveness
Securityconsiderati
on notprioritised
Insufficientresources
Lack oflabelled
data
Lack ofdatasets
Labour-intensive feature
engineering
Insufficientsecurity
knowledge
Scalability
Observations
• Abstract Syntax Trees (ASTs): an effective code representations.
• Software source code shares similar statistical properties to natural language.
• Vulnerabilities from different projects share common knowledge, which is discoverable by deep learning algorisms.
CNN
Representations learning
The input Low-level features Mid-level features High-level features
Latent, abstract features describing programming patterns/characteristics
CNN
AST
RNN
Methodology
Network Architecture
Data
Feature Engineering
ML Algorithms
Evaluations
Taxonomy – Our Work
Source code
Binary / Assembly
Pattern-based
Text-based
Code Properties
Trees – Abstract Syntax Tree (AST)
Graphs
Function Call Graphs
Data Flow Graphs
Control Flow Graphs
Dependency Graphs
Program SliceCode Gadgets
Imports/API calls
Rules / Templates
Bag-of-words
Word2Vec / FastText / Code2Vec…
-- Code metrics
Logistic Regression
SVM
Random Forest
Markov model….
Conventional
RNN
DNN
Deep belief network
Deep learning
LSTM
GRU
OthersGenetic Algorithm --
Accuracy
Efficiency
Detection Granularity
Precision
Recall
F-measureDetection Performance
Top-k precision/recall
The Datasets
457vulnerablefunctions
32,531non-
vulnerablefunctions
6open-source
projects
1,000+releases
NVDCVE
repositories
Results
Results
Results
Binary Vulnerability Detection
Future Work
Binary-level
detection
Instruction-level
granularity
Specific-typevulnerability
detection
Focusing on scenarios where the source code is unavailable
Identifying multiple instructions (reverse-engineering) that are
potentially vulnerable
Focusing on vulnerabilities causedby missing checks (e.g. numeric
errors).
Example 2 - ML-based malware detection
Example 3 – Twitter spam detection
Example 4 - Network traffic classification
Research Methodology
Collect data for security
problem
Extract raw or low level
features
Apply data analysis
Security professionals Domain knowledge Model analytics
Data-driven Cyber Security
Resources
• G. Lin, J. Zhang, W. Luo, L. Pan, Y. Xiang, O. D. Vel, and P. Montague, “Cross-Project Transfer Representation Learning for Vulnerable Function Discovery,” IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3289-3297, 2018.
• C. Chen, Y. Wang, J. Zhang, Y. Xiang, W. Zhou, and G. Min, “Statistical Features Based Real-time Detection of Drifted Twitter Spam,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 4, pp. 914-925, 2017.
• J. Zhang, X. Chen, Y. Xiang, W. Zhou, and J. Wu, “Robust Network Traffic Classification,” IEEE/ACM Transactions on Networking, vol. 23, no. 4, pp. 1257-1270, 2015.
• S. Cesare, Y. Xiang, and W. Zhou, “Control Flow-based Malware Variant Detection,” IEEE Transactions on Dependable and Secure Computing, vol. 11, no. 4, pp. 307-317, 2014.
• S. Cesare, Y. Xiang, and W. Zhou, “Malwise - An Effective and Efficient Classification System for Packed and Polymorphic Malware,” IEEE Transactions on Computers, vol. 62, no. 6, pp. 1193-1206, 2013.
• J. Zhang, Y. Xiang, Y. Wang, W. Zhou, Y. Xiang, and Y. Guan, “Network Traffic Classification Using Correlation Information,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 1, pp. 104-117, 2013.
Sponsors & Collaborators