Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Data-driven Cyber Security to Counterfeit Malicious Attacks

Yang Xiang

Swinburne University of Technology, Australia

[email protected]

Cybersecurity Lab Core Capabilities

• FinTech and blockchain• Risk and decision making• Trustworthiness• Data privacy• Spam detection

Applicationsecurity

• Security analytics• Threat prediction• Machine learning for cyber• Social networks security• Insider attacks detection

Data security

• Network, SDN, NFV security• Cloud security• CPS/IoT security• Ransomware/Malware• Autonomous security

System securityHar

dw

are

–So

ftw

are

–D

ata

–Se

rvic

e

Real-world DataSecurityModellingReasoning

Research Methodology

Data-driven Cyber

Security

Cyber threat

analysis

Model security problem

Data collection

Machine learning

customization

Examples

Data-driven Cyber

Security

Software vulnerability

detection

ML-based malware detection

Twitter spam

detection

Network traffic

classification

Software vulnerability detection

500,000ServersAffected

MillionsServers

Attacked

150CountriesAffected

$4Billion

Loss

7~8%CPU Loss

IntelSGX

2014

$xxx Loss

2017

2017

2018

Challenge

1Software

Complexity

45million

lines

61million

lines

70million

lines

100+million lines

Challenge

2Vulnerability

Numbers

6,480

6,447

14,714

20,000+54+/day

2015

2016

2017

2018

Challenge

3Lackof Data

Efficiency

Effectiveness

Securityconsiderati

on notprioritised

Insufficientresources

Lack oflabelled

data

Lack ofdatasets

Labour-intensive feature

engineering

Insufficientsecurity

knowledge

Scalability

Observations

• Abstract Syntax Trees (ASTs): an effective code representations.

• Software source code shares similar statistical properties to natural language.

• Vulnerabilities from different projects share common knowledge, which is discoverable by deep learning algorisms.

CNN

Representations learning

The input Low-level features Mid-level features High-level features

Latent, abstract features describing programming patterns/characteristics

CNN

AST

RNN

Methodology

Network Architecture

Data

Feature Engineering

ML Algorithms

Evaluations

Taxonomy – Our Work

Source code

Binary / Assembly

Pattern-based

Text-based

Code Properties

Trees – Abstract Syntax Tree (AST)

Graphs

Function Call Graphs

Data Flow Graphs

Control Flow Graphs

Dependency Graphs

Program SliceCode Gadgets

Imports/API calls

Rules / Templates

Bag-of-words

Word2Vec / FastText / Code2Vec…

-- Code metrics

Logistic Regression

SVM

Random Forest

Markov model….

Conventional

RNN

DNN

Deep belief network

Deep learning

LSTM

GRU

OthersGenetic Algorithm --

Accuracy

Efficiency

Detection Granularity

Precision

Recall

F-measureDetection Performance

Top-k precision/recall

The Datasets

457vulnerablefunctions

32,531non-

vulnerablefunctions

6open-source

projects

1,000+releases

NVDCVE

repositories

Results

Results

Results

Binary Vulnerability Detection

Future Work

Binary-level

detection

Instruction-level

granularity

Specific-typevulnerability

detection

Focusing on scenarios where the source code is unavailable

Identifying multiple instructions (reverse-engineering) that are

potentially vulnerable

Focusing on vulnerabilities causedby missing checks (e.g. numeric

errors).

Example 2 - ML-based malware detection

Example 3 – Twitter spam detection

Example 4 - Network traffic classification

Research Methodology

Collect data for security

problem

Extract raw or low level

features

Apply data analysis

Security professionals Domain knowledge Model analytics

Data-driven Cyber Security

Resources

• G. Lin, J. Zhang, W. Luo, L. Pan, Y. Xiang, O. D. Vel, and P. Montague, “Cross-Project Transfer Representation Learning for Vulnerable Function Discovery,” IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3289-3297, 2018.

• C. Chen, Y. Wang, J. Zhang, Y. Xiang, W. Zhou, and G. Min, “Statistical Features Based Real-time Detection of Drifted Twitter Spam,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 4, pp. 914-925, 2017.

• J. Zhang, X. Chen, Y. Xiang, W. Zhou, and J. Wu, “Robust Network Traffic Classification,” IEEE/ACM Transactions on Networking, vol. 23, no. 4, pp. 1257-1270, 2015.

• S. Cesare, Y. Xiang, and W. Zhou, “Control Flow-based Malware Variant Detection,” IEEE Transactions on Dependable and Secure Computing, vol. 11, no. 4, pp. 307-317, 2014.

• S. Cesare, Y. Xiang, and W. Zhou, “Malwise - An Effective and Efficient Classification System for Packed and Polymorphic Malware,” IEEE Transactions on Computers, vol. 62, no. 6, pp. 1193-1206, 2013.

• J. Zhang, Y. Xiang, Y. Wang, W. Zhou, Y. Xiang, and Y. Guan, “Network Traffic Classification Using Correlation Information,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 1, pp. 104-117, 2013.

Sponsors & Collaborators

Documents

Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia