27
Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia [email protected]

Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Data-driven Cyber Security to Counterfeit Malicious Attacks

Yang Xiang

Swinburne University of Technology, Australia

[email protected]

Page 2: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia
Page 3: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Cybersecurity Lab Core Capabilities

• FinTech and blockchain• Risk and decision making• Trustworthiness• Data privacy• Spam detection

Applicationsecurity

• Security analytics• Threat prediction• Machine learning for cyber• Social networks security• Insider attacks detection

Data security

• Network, SDN, NFV security• Cloud security• CPS/IoT security• Ransomware/Malware• Autonomous security

System securityHar

dw

are

–So

ftw

are

–D

ata

–Se

rvic

e

Page 4: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Real-world DataSecurityModellingReasoning

Page 5: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Research Methodology

Data-driven Cyber

Security

Cyber threat

analysis

Model security problem

Data collection

Machine learning

customization

Page 6: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Examples

Data-driven Cyber

Security

Software vulnerability

detection

ML-based malware detection

Twitter spam

detection

Network traffic

classification

Page 7: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Software vulnerability detection

Page 8: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

500,000ServersAffected

MillionsServers

Attacked

150CountriesAffected

$4Billion

Loss

7~8%CPU Loss

IntelSGX

2014

$xxx Loss

2017

2017

2018

Page 9: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Challenge

1Software

Complexity

45million

lines

61million

lines

70million

lines

100+million lines

Page 10: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Challenge

2Vulnerability

Numbers

6,480

6,447

14,714

20,000+54+/day

2015

2016

2017

2018

Page 11: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Challenge

3Lackof Data

Efficiency

Effectiveness

Securityconsiderati

on notprioritised

Insufficientresources

Lack oflabelled

data

Lack ofdatasets

Labour-intensive feature

engineering

Insufficientsecurity

knowledge

Scalability

Page 12: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Observations

• Abstract Syntax Trees (ASTs): an effective code representations.

• Software source code shares similar statistical properties to natural language.

• Vulnerabilities from different projects share common knowledge, which is discoverable by deep learning algorisms.

Page 13: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

CNN

Representations learning

The input Low-level features Mid-level features High-level features

Latent, abstract features describing programming patterns/characteristics

CNN

AST

RNN

Page 14: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Methodology

Page 15: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Network Architecture

Page 16: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Data

Feature Engineering

ML Algorithms

Evaluations

Taxonomy – Our Work

Source code

Binary / Assembly

Pattern-based

Text-based

Code Properties

Trees – Abstract Syntax Tree (AST)

Graphs

Function Call Graphs

Data Flow Graphs

Control Flow Graphs

Dependency Graphs

Program SliceCode Gadgets

Imports/API calls

Rules / Templates

Bag-of-words

Word2Vec / FastText / Code2Vec…

-- Code metrics

Logistic Regression

SVM

Random Forest

Markov model….

Conventional

RNN

DNN

Deep belief network

Deep learning

LSTM

GRU

OthersGenetic Algorithm --

Accuracy

Efficiency

Detection Granularity

Precision

Recall

F-measureDetection Performance

Top-k precision/recall

Page 17: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

The Datasets

457vulnerablefunctions

32,531non-

vulnerablefunctions

6open-source

projects

1,000+releases

NVDCVE

repositories

Page 18: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Results

Page 19: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Results

Page 20: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Results

Page 21: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Binary Vulnerability Detection

Page 22: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Future Work

Binary-level

detection

Instruction-level

granularity

Specific-typevulnerability

detection

Focusing on scenarios where the source code is unavailable

Identifying multiple instructions (reverse-engineering) that are

potentially vulnerable

Focusing on vulnerabilities causedby missing checks (e.g. numeric

errors).

Page 23: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Example 2 - ML-based malware detection

Page 24: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Example 3 – Twitter spam detection

Page 25: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Example 4 - Network traffic classification

Page 26: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Research Methodology

Collect data for security

problem

Extract raw or low level

features

Apply data analysis

Security professionals Domain knowledge Model analytics

Data-driven Cyber Security

Page 27: Data-driven Cyber Security to Counterfeit Malicious Attacks · Data-driven Cyber Security to Counterfeit Malicious Attacks Yang Xiang Swinburne University of Technology, Australia

Resources

• G. Lin, J. Zhang, W. Luo, L. Pan, Y. Xiang, O. D. Vel, and P. Montague, “Cross-Project Transfer Representation Learning for Vulnerable Function Discovery,” IEEE Transactions on Industrial Informatics, vol. 14, no. 7, pp. 3289-3297, 2018.

• C. Chen, Y. Wang, J. Zhang, Y. Xiang, W. Zhou, and G. Min, “Statistical Features Based Real-time Detection of Drifted Twitter Spam,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 4, pp. 914-925, 2017.

• J. Zhang, X. Chen, Y. Xiang, W. Zhou, and J. Wu, “Robust Network Traffic Classification,” IEEE/ACM Transactions on Networking, vol. 23, no. 4, pp. 1257-1270, 2015.

• S. Cesare, Y. Xiang, and W. Zhou, “Control Flow-based Malware Variant Detection,” IEEE Transactions on Dependable and Secure Computing, vol. 11, no. 4, pp. 307-317, 2014.

• S. Cesare, Y. Xiang, and W. Zhou, “Malwise - An Effective and Efficient Classification System for Packed and Polymorphic Malware,” IEEE Transactions on Computers, vol. 62, no. 6, pp. 1193-1206, 2013.

• J. Zhang, Y. Xiang, Y. Wang, W. Zhou, Y. Xiang, and Y. Guan, “Network Traffic Classification Using Correlation Information,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 1, pp. 104-117, 2013.

Sponsors & Collaborators